On Mon, Mar 9, 2009 at 10:16, Patrick R. Michaud <pmich...@pobox.com> wrote:
>> On Sun, Mar 08, 2009 at 09:43:17AM +0100, pugs-comm...@feather.perl6.nl 
>> wrote:
>> =item * ws
>>
>> Match whitespace between tokens.
>>
>> =item * space
>>
>> Match a single whitespace character. Hence C< <ws> > is equivalent to C< 
>> <space>+ >.
>
>
> The definitions of <ws> and <space> above are incorrect, or at least
> misleading.  <ws> matches required whitespace between pairs of word
> characters; it's optional whitespace otherwise.  The default definition
> of <ws> is something like:
>
>    token ws { <?before \w> <?after \w> <!> || \s* }
>
> It's certainly _not_ the case that <ws> is equivalent to <space>+ .
>
> To make things a bit quicker for people writing custom versions of
> <ws> (which may need to include "comment whitespace"), the Parrot
> Compiler Toolkit also provides an optimized <ww> rule that matches
> only between a pair of word characters.  Then the default definition
> of <ws> becomes
>
>    token ws { <!ww> \s* }
>
> Grammars can change this to things like:
>
>    token ws { <!ww> [ \s+ || '#' \h* \n ]* }
>
if you need a mnemonic to help you remember what 'ww' means, use 'within word'.

this reminds me that pge's <ww> may be incorrect in its treatment of
<apostrophe>.  these characters (<['-]> by default) are word
characters, but i don't think that's been tested, and i don't think
it's been implemented, either.

~jerry

Reply via email to