Re: New version of PGE released

Patrick R. Michaud Tue, 03 May 2005 22:11:07 -0700

On Tue, May 03, 2005 at 09:22:11PM +0100, Nicholas Clark wrote:
> 
> Whilst I confess that it's unlikely to be me here, if anyone has the time
> to contribute some help, do you have a list of useful self-contained tasks
> that people might be able to take on?


Very good question -- I can probably come up with a few.  Here's
a short list:

1.  Benchmarking -- it would be really good to have some sort of
rudimentary stats to know the approximate speed difference between
PGE (which currently has few optimizations) and Perl 5's regular 
expression engine.  It doesn't have to be anything extensive -- 
just a couple of "order-of-magnitude" estimations so we can get some 
idea of just how hard we'll eventually have to work to get the
performance we're used to, and what sorts of optimizations or support
we'll need to be doing that.

2.  Parsing of Perl 5 regular expressions -- PGE is designed so that
it can use other matching syntaxes, such as (say) the Perl 5 regular
expression syntax.  The previous version of PGE had a p5 regex
parser in C that I threw together; I'll throw one together for this
version but it would be good to have someone else shepherd that.
Parsing the regexes isn't all that difficult, and one can easily
look at the P6Rule parser for guidance (or ask, too).  Once the
regex is parsed then PGE can largely handle the rest.

3.  "Token" hashes.  The p6 rules syntax (Synopsis 05) says that
a hash appearing in a rule uses the entry having the longest
matching key of the hash.  For example, if I have a hash with
entries  ( 'a' => 1, 'at' => 3, 'adam' => 4 ), then the longest
matching key for "atomic" would be "at".

This can be very useful for tokenizing input streams if you 
want to build, say, a compiler.  :-) In fact, I ended up using 
hashes in exactly this way for the P6Rule parser.  Unfortunately, 
Parrot doesn't have a "longest matching key" op, so one can
either do a brute-force search of the keys (ick), or we can build
something a little more optimal.  For the time being I created
a simple TokenHash class in PIR that acts somewhat like a Hash
but also has a method for "the longest key matching a 
string (starting at position X)" that doesn't require searching 
all of the keys.  But it's not the greatest implementation, and
this may be one of those things where efficiency needs will
drive us to a lower-level implementation.

I'll probably come up with more tasks, but that's a start.

Pm

Re: New version of PGE released

Reply via email to