On Tue, May 03, 2005 at 09:22:11PM +0100, Nicholas Clark wrote: > > Whilst I confess that it's unlikely to be me here, if anyone has the time > to contribute some help, do you have a list of useful self-contained tasks > that people might be able to take on?
Very good question -- I can probably come up with a few. Here's a short list: 1. Benchmarking -- it would be really good to have some sort of rudimentary stats to know the approximate speed difference between PGE (which currently has few optimizations) and Perl 5's regular expression engine. It doesn't have to be anything extensive -- just a couple of "order-of-magnitude" estimations so we can get some idea of just how hard we'll eventually have to work to get the performance we're used to, and what sorts of optimizations or support we'll need to be doing that. 2. Parsing of Perl 5 regular expressions -- PGE is designed so that it can use other matching syntaxes, such as (say) the Perl 5 regular expression syntax. The previous version of PGE had a p5 regex parser in C that I threw together; I'll throw one together for this version but it would be good to have someone else shepherd that. Parsing the regexes isn't all that difficult, and one can easily look at the P6Rule parser for guidance (or ask, too). Once the regex is parsed then PGE can largely handle the rest. 3. "Token" hashes. The p6 rules syntax (Synopsis 05) says that a hash appearing in a rule uses the entry having the longest matching key of the hash. For example, if I have a hash with entries ( 'a' => 1, 'at' => 3, 'adam' => 4 ), then the longest matching key for "atomic" would be "at". This can be very useful for tokenizing input streams if you want to build, say, a compiler. :-) In fact, I ended up using hashes in exactly this way for the P6Rule parser. Unfortunately, Parrot doesn't have a "longest matching key" op, so one can either do a brute-force search of the keys (ick), or we can build something a little more optimal. For the time being I created a simple TokenHash class in PIR that acts somewhat like a Hash but also has a method for "the longest key matching a string (starting at position X)" that doesn't require searching all of the keys. But it's not the greatest implementation, and this may be one of those things where efficiency needs will drive us to a lower-level implementation. I'll probably come up with more tasks, but that's a start. Pm