Work as a Software Engineer.
Program in C, C++, Java (plus Google Web Toolkit & Google App Engine), Python (also GAE), Perl, Make, Bash, and to a lesser extent Ruby (some Rails), LISP/Scheme, SQL. Familiar with the Unix (Linux, OS X) platform but not Windows. Have published working Open Source code that some people seem to find useful (see below). Worked as a programmer on a UC Berkeley research team. Have worked independently for long periods. Can express myself in written English.
Total of 8 years, 4 months paid industry and university staff programming experience; one year of that was half-time. In addition write open source and do Computer Science research on my own; see below.
Helped convert a prototype for a virus analysis pipeline into a schema for an industrial-strength implementation. This was work with Charles Chiu and Narayanan Veeraraghavan of Dr. Chiu's lab and Doug Judd of Hypertable.
Harmony Explained: Progress Towards A Scientific Theory of Music: Most music theory books are like medieval medical textbooks: they contain unjustified superstition, non-reasoning, and funny symbols glorified by Latin phrases. How does music, in particular harmony, actually work, presented as a real, scientific theory of music? In particular we derive from first principles of Physics and Computation the following three fundamental phenomena of music:
Manhattan-Connected Exact-Coloring is NP-Complete: Accidentally solved an open theory problem while helping a friend with his homework (of course with the intent to provide proper notification to the course grader) who was taking Professor Richard M. Karp's CS 270 Combinatorial Algorithms and Data Structures class at The University of California, Berkeley.
Distributed Transactions for Google App Engine: (a few years off and on) Optimistic Distributed Transactions built upon Local Multi-Version Concurrency Control, a project with Simon Fredrick Vicente Goldsmith, Ryan Barrett, Erick Armbrust, Robert Johnson, Alfred Fuller. Ryan Barrett, one of the founders of the Google App Engine team, said "Thanks. This has been on our todo list for years." Erick Armbrust at Google has implemented it as part of tapioca-orm. Presented by me at CodeCon 2009 and at Google I/O 2009 (see video of me speaking).
Hard Object (several years, started around 2005): a lightweight change to the standard x86 hardware platform to provide software engineers with fine-grain memory safety; I am the lead of this project having many collaborators. Many projects have tried to do this before, going back several decades, but ours actually seems to work! Mark Winterrowd and I have worked together on it full time for about a year until he was recently hired at Coverity, but the project goes back to about 2005 an involves several collaborators at Cal Berkeley. This is a long-term project and, except for the first tech report, remains unpublished as we file patent applications. Final tech report still in preparation.
MTS: Multi-TeSter: a simple domain-specific language for maintaining tests that supports two properties I have found desirable:
A Proposal for Proquints: Identifiers that are Readable, Spellable, and Pronounceable; Open Source implementation in C and Java: http://github.com/dsw/proquint. Identifiers (IDs) are pervasive throughout our modern life. We suggest that these IDs would be easier to manage and remember if they were easily readable, spellable, and pronounceable. As a solution to this problem we propose using PRO-nouncable QUINT-uplets of alternating unambiguous consonants and vowels: proquints.
My functional, object-oriented programming language, unpublished: several months full time, suspended.
Elsa and Oink: Worked on Elsa, a C++ front-end maintained by Scott McPeak. Wrote a client of Elsa called Oink that is a framework for static analysis of C and C++. The primary tool of the framework right now is a second-generation implementation of the Cqual data-flow analysis tool by Jeff Foster and others. Elsa and Oink were presented by Scott and me at CodeCon 2006. Karl Chen has used Cqual++ to find what seem to be many previously unknown remotely exploitable printf() format string vulnerabilities in Debian; the false positive rate is less than 50% at the package granularity. Elsa (but not Oink) is now used by Mozilla as part of their refactoring effort.
Added to Oink/Cqual++ a feature for lightweight enforcement of privacy boundaries in C at the file granularity which requires no modification to the C code; unpublished, but the parts of the analysis that were finished occur in the Oink distribution. This analysis was designed in collaboration with Matt Harren and David Molnar.
Build Interceptor: Oink needs the pre-processed (.i) files from a project to work on; this can be hard to obtain as build processes are varied. Implemented a second (or third) generation tool based on previous efforts by Hao Chen and Ben Liblit called Build Interceptor (build-interceptor.tigris.org) that solves this problem as long as the compiler is gcc; it is used to intercept much of Debian, per our analysis mentioned above.
Delta: Scott McPeak and I wrote Delta (delta.tigris.org), a simple implementation of Andreas Zeller's Delta Debugging algorithm. Delta can minimize a quarter-million line file that crashes a C++ front-end down to a few pages of code that produces the same result and do this completely automatically. I have been told that "delta is saving a lot of gcc developers life[sic]." This was my first Open Source project and was released to the public at the request of Microsoft Research (!). Presented by me at CodeCon 2006; I was the first person in the history of CodeCon to get two projects accepted in the same year (the other was Oink).
Trend-Prof: Worked with Simon Goldsmith on Trend Profiler (trend-prof.tigris.org), a new kind of profiling tool that looks for trends in the performance of a program; when tested on only small and medium sized inputs Trend Prof allows the programmer to find super-linearities in a program that might not only show up on a tool like gprof until the program is run on large inputs. That is, super-linearities show up to Trend Prof even when they are a small percentage of the whole running time. Presented by Simon at CodeCon 2009. Published as part of Simon's Ph.D. thesis at Cal Berkeley and separately as Measuring Empirical Computational Complexity.
Maximus: Have done a re-implementation of a second-generation version of Alex Aiken's MOSS program for string matching detection called Maximus; unpublished because UC will not allow it to be released as Open Source due to the Winnowing algorithm being patented. Like MOSS, Maximus uses Karp-Rabin and the Winnowing Algorithm (which I co-invented); however it extends the precision of MOSS by also implementing, as suggested by Joel Auslander, Ukkonen's Linear Time Suffix Tree algorithm.
Subversion tools: Faced with needing to modify the history of a Subversion repository, implemented svn_dump2dir and svn_dir2dump, two Perl scripts that explode Subversion dump files into directories and back. The svn dump file format quotes embedded files using counts, making them hard to edit or patch; whereas in directory form editing is straightforward. Implemented simple_patch Perl script that does less than what the standard patch tool does so it is easier to control. The only user feedback I've gotten is that the police of Queensland Australia seem to use it for something.
Theoretical work: Co-invented a new algorithm, Winnowing, for all-to-all string matching that for the parameters at which we used it reduced the size of the database by a factor of 8 and the query time by a factor of 3. (Implemented in maximus above.)
University of California, Berkeley, five years graduate study in Theoretical Computer Science (from the Math department), 1991-1996.
University of Maryland at College Park, Bachelor of Science, Mathematics, 1990.
Budapest Semesters in Mathematics, Spring 1990.
Hampshire College Summer Studies in Mathematics Summer 1986. Later also taught there by invitation.
Johns Hopkins Center for the Advancement of Academically Talented Youth, three-week summer programs: 1983 Intro to CS, 1984 Data Structures, 1985 Automata Theory and Formal Languages.
My brother and I taught ourselves to code in BASIC on an Atari 800. My mom says I was 10 and he was 9, but I remember being 12.
Note that these articles can be downloaded from my homepage, except the Math Magazine article.
I came to this problem through a friend who was taking Professor Richard M. Karp's CS 270 Combinatorial Algorithms and Data Structures class at The University of California, Berkeley. At the time I was just trying to help a friend with his homework (of course with the intent to provide proper notification to the course grader). After I solved it my friend told me that in fact neither Karp nor his teaching assistant had a solution.
Manhattan-Connected Exact-Coloring Is NP-Complete. Daniel S. Wilkerson.
I am proud of this paper as it is a simple solution to a practical problem. My contributions include that I conjectured power-laws as a low-dimensional way to model the number of executions of a line of code in a program.
Measuring Empirical Computational Complexity. SEC/FSE'07: The 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, Simon F. Goldsmith, Alex S. Aiken, Daniel S. Wilkerson.
This paper is theoretically cool, but more importantly it actually made a real string-matching index 8 times smaller and 3 times faster on the same queries. I am now only interested in this kind of practical stuff.
Winnowing: Local Algorithms for Document Fingerprinting. Saul Schleimer, Daniel S. Wilkerson, Alex Aiken. ACM Special Interest Group on Management of Data, 2003.
I worked on the theory part of this paper and the relevance of that part to reality is pretty questionable. The implementation side done by Alan and Brent is pretty cool, but I didn't work on that.
System Area Network Mapping. A. Mainwaring, B. Chun, S. Schleimer, D. Wilkerson. ACM Symposium on Parallel Algorithms and Architectures, 1997.
The papers below are useless theoretical junk, but you have to put something on your resume. I no longer do this kind of stuff.
Faster Computation on Directed Networks of Automata. R. Ostrovsky, D. Wilkerson. ACM Symposium on Principles of Distributed Computing, 1995.
Nearly Tight Bounds for Wormhole Routing. A. Ranade, S. Schleimer, D. Wilkerson. IEEE Symposium on Foundations of Computer Science, 1994.
Optimal Leapfrogging. J. Auslander, A. Benjamin, D. Wilkerson. The Mathematics Magazine, 1994.
Write the occasional non-fiction short story or essay (see my web page). Working on a real Theory of Music to replace the current "medieval medical textbook" style pseudo-theory. Speak some conversational Hungarian. Like to sail and hike.