Mr. Nobody wrote: > /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/ > > would actually become longer: > > /^(<[+-]>?)<before \d|\.\d>\d*(\.\d*)?(<[Ee]>(<[+-]>?\d+))?$/
Your first expression uses capturing parens, but the captures don't bind anything useful, so you should probably compare non-capturing versions of the regex: /^[+-]?(?=\d|\.\d)\d*(?:\.\d*)?(?:[Ee][+-]?\d+)?$/ vs /^<[+-]>?<before \d|\.\d>\d*[\.\d*]?[<[Ee]><[+-]>?\d+]?$/ The <[Ee]> isn't the way I'd write it in Perl 6 -- I'd shift into case-insensitive mode temporarily because those hand-written [Cc][Aa][Ss][Ee] insensitive matches are hard to read. /^<[+-]>?<before \d|\.\d>\d*[\.\d*]?[:i e<[+-]>?\d+]?$/ Now Perl 6 is just 5 characters longer. That's a horrible pattern to read though. Can Perl 6 "fix" that? I think so. I'd change the <[+-]> fragments to use a sub-rule because repeated constants make things harder to read. (Not so bad in this case, but it's a good general rule -- and you're making generalizations about regex syntax.) /^<sign>?<before \d|\.\d>\d*[\.\d*]?[:i e<sign>?\d+]?$/ I'd put in some white space to clarify the different logical pieces of the rule: /^ <sign>? <before \d | \.\d> \d* [\.\d*]? [:i e <sign>? \d+]? $/ Now it's pretty obvious that the :i can be moved outside the rule without screwing anything up. I'd rather have modifiers affect the whole rule rather than remembering where they begin and end inside it. :i /^ <sign>? <before \d | \.\d> \d* [\.\d*]? [e <sign>? \d+]? $/ That's how I'd write your Perl 5 regex in Perl 6. (Well, actually it's probably just /^ <number> $/, but would you call that cheating? ;) It does have more characters than the Perl 5 regex. Looking at it another way, it has fewer symbols. It's faster to read. How many times are you going to write it? How many times are you going to read it? When I was reading A5, I was concerned about character classes too, but mostly because of the regex style that I learned from the Friedl book: opening normal* ( special normal* )* closing which can be used to match quoted strings for example: /"[^"\\]*(\\.[^"\\]*)*"/ The direct Perl 6 equivalent is not very pretty: /" <-["\\]>* [ \\. <-["\\]>* ]* "/ It's hard to come up with a good name for the character class used there. not_a_quote_or_slash? special_char_in_quote? I'm not concerned about it anymore because I think the Perl 6 style will be: opening ( special :: | . )*? closing The non-greedy match makes so many things easier to write and the backtracking control prevents the special case from accidentally matching the normal one. I'd write the string match in Perl 6 like this: /" [ \\. :: | . ] *? "/ The only possible problem with this is non-greedy iteration is slower. It doesn't have to be though -- and the optimizations needed to get Perl 6 rules to match full grammars should fix this. If the pattern is rewritten as a grammar, we can talk about first and follow sets. <quoted_string>: " <quoted_char_seq> " <quoted_char_seq>: <null> | <quoted_char> <quoted_char_seq> <quoted_char>: \ <any> | <any> The reason non-greedy matching is slow is because the rule <quoted_char_seq> can be empty, i.e. it always matches the current spot. However, the follow set of <quoted_char_seq> is the quote. That means the *only* thing that can follow <quoted_char_seq> is a quote. There's no point in returning (taking the <null> route) unless the rule is looking at a quote. This reduces backtracking tremendously. The other problem would normally be in the conflict between the first sets of <quoted_char>. The slash character is also an <any> character, so if the slash alternative is taken, the system has to prepare to backtrack to the <any> alternative. The :: backtracking control eliminates the backtracking point, so it's impossible for an escape sequence to be re-parsed as two separated characters. Damian wrote several good examples of Perl 5 -> Perl 6 conversions. Take a look at E5 and experiment some more. The built-in named rules may simplify a lot of things too -- we're going to have a much richer library than just \d, \w, etc. - Ken