Re: Suggestion for perl 6 regex syntax

Adam D. Lopresto Sun, 08 Sep 2002 07:52:19 -0700

Some regexpes will be longer, but a lot will benefit from the changes, by being
shorter or clearer, or often, both.  The problem with your suggestion is you're
making assumeptions about what's common and what's not (character classes more
common than closures, for instance) that probably aren't accurate.  You could
certainly make your own <a ...> and <b ...> rules, but I bet most people won't
use them nearly enough for that sort of shortening.


Also, your sample regexps aren't exactly fair, because you're using capturing
groups all over the place for things that I'm pretty sure you don't really want
to capture, just to group.  So your perl5 would have to be rewritten from

/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/         #50 chars

to

/^([+-]?)(?=\d|\.\d)(\d*(?:\.\d*)?)(?:[Ee]([+-]?\d+))?$/   #56 chars

(capturing sign, mantissa, and exponent: you could capture differnt things if
you want, or you could capture nothing.  But capturing random things just
because they save you a few keystrokes isn't a good practice)

We could golf that down to 

/^(+|-)?(?=\d|\.\d)(\d*(?:\.\d*)?)(?:E([+-]?\d+))?$/i      #53 chars

if you really care about every last character.

The perl 6 equivalent becomes

:i/^(+|-)?<before \d|\.\d>(\d*[\.\d*]?)[E([+|-]?\d+)]?$/   #56 chars

So you're not losing much at all.  That is, if you really want to spend forever
fighting for every last character.  The point is, when you use unusual
contructs like lookaheads you pay a price in order to get more clarity.  When
you use common/good things like non-capturing parens, you are rewarded in fewer
keystrokes.  

Also note that we could dramatically rewrite the pattern, and instead of doing
a lookahead assertion we can use an actual code assertion to assert that the
mantissa isn't empty, making the new perl6 code actually shorter than the
(correct) perl5 version.  Of course, that's because we use perl6's strengths.

:i/^(+|-)?(\d*[\.\d*]?)<($2=~/./)>[E([+|-]?\d+)]?$/        #51


> While Apocolypse 5 raises some good points about problems with the old regex
> syntax, its new syntax is actually worse than in perl 5. Most regexes, such
> as this one to match a C float
> 
> /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/
> 
> would actually become longer:
> 
> /^(<[+-]>?)<before \d|\.\d>\d*(\.\d*)?(<[Ee]>(<[+-]>?\d+))?$/
> 
> Therefore I propose a few minor changes:
> 
> character class:      [...]
> non-captured group:   {...}
> closure:              <{...}>
> lookahead assertion:  <b ...>
> lookbehind assertion: <a ...>
> 
> This would bring the aforementioned regex back down to perl 5 size, and many
> would become even shorter than in perl 5.
> 
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Finance - Get real-time stock quotes
> http://finance.yahoo.com
> 

-- 
Adam Lopresto ([EMAIL PROTECTED])
http://cec.wustl.edu/~adam/

Dreaming permits each and every one of us to be quietly and safely
insane every night of our lives.

     --William Dement

Re: Suggestion for perl 6 regex syntax

Reply via email to