On Mon, Mar 09, 2009 at 10:32:02AM -0700, jerry gay wrote: > > To make things a bit quicker for people writing custom versions of > > <ws> (which may need to include "comment whitespace"), the Parrot > > Compiler Toolkit also provides an optimized <ww> rule that matches > > only between a pair of word characters. Then the default definition > > of <ws> becomes > > > > token ws { <!ww> \s* } > > if you need a mnemonic to help you remember what 'ww' means, use 'within > word'. > > this reminds me that pge's <ww> may be incorrect in its treatment of > <apostrophe>. these characters (<['-]> by default) are word > characters, but i don't think that's been tested, and i don't think > it's been implemented, either.
A couple of clarifications: - PGE doesn't implement <ww> by default, because that's not (yet?) part of the spec. It only appears in PCT::Grammar, for people using the Parrot Compiler Toolkit to create languages. - AFAICT, apostrophe and hyphen are not yet "word characters" in the sense of being members of \w . That is, they're considered to be valid in identifiers, but only when they are immediately preceded by a word character and immediately followed by an alphabetic character. Otherwise they're not part of the identifier. (At least, that's how the current STD.pm reads.) Pm