On Mon, Mar 9, 2009 at 10:16, Patrick R. Michaud <pmich...@pobox.com> wrote: >> On Sun, Mar 08, 2009 at 09:43:17AM +0100, pugs-comm...@feather.perl6.nl >> wrote: >> =item * ws >> >> Match whitespace between tokens. >> >> =item * space >> >> Match a single whitespace character. Hence C< <ws> > is equivalent to C< >> <space>+ >. > > > The definitions of <ws> and <space> above are incorrect, or at least > misleading. <ws> matches required whitespace between pairs of word > characters; it's optional whitespace otherwise. The default definition > of <ws> is something like: > > token ws { <?before \w> <?after \w> <!> || \s* } > > It's certainly _not_ the case that <ws> is equivalent to <space>+ . > > To make things a bit quicker for people writing custom versions of > <ws> (which may need to include "comment whitespace"), the Parrot > Compiler Toolkit also provides an optimized <ww> rule that matches > only between a pair of word characters. Then the default definition > of <ws> becomes > > token ws { <!ww> \s* } > > Grammars can change this to things like: > > token ws { <!ww> [ \s+ || '#' \h* \n ]* } > if you need a mnemonic to help you remember what 'ww' means, use 'within word'.
this reminds me that pge's <ww> may be incorrect in its treatment of <apostrophe>. these characters (<['-]> by default) are word characters, but i don't think that's been tested, and i don't think it's been implemented, either. ~jerry