Re: rules trouble

Patrick R. Michaud Thu, 12 May 2005 07:41:52 -0700

[My reply below likely belongs on either perl6-compiler or perl6-language,
but I didn't want to do a lot of cross-posting, so I'm replying
to perl6-internals for now (with apologies to p6i) and followups 
should probably go to p6c or p6l.  --Pm]

On Thu, May 12, 2005 at 01:51:04AM -0400, Dino Morelli wrote:
> I'm working on more p6rules unit tests.
> 
> Having some trouble. First, understanding when :w means \s* and when it
> means \s+

I'll do my best to explain.  From A05, <ws> means \s+ whenever it's 
between two identifiers (i.e., two sets of word characters) and \s* 
between anything else.  Furthermore, according to S05, <ws> decides 
this based on the contents of the matched string, not the pattern 
being matched.

Thus, a pattern like

    rx :w /hello -?world/

becomes

    rx /hello <?ws> -?world/

which matches any of

    hello world
    hello-world
    hello  -world
    hello   world
    hello\nworld

but not

    helloworld

Thus,  <ws> fails if it occurs between two word characters in 
the target string, and it greedily consumes any whitespace at 
that point in the match.  

We might speculate that <ws> is equivalent to \b\s*, but \b fails
between pairs of non-word characters, whereas <ws> will succeed.

Followups on this question should probably go to p6c or p6l.

> Also, these tests are failing when I use :: to separate the modifier
> from the pattern. But they work when I do ':w blah' (separate with a
> space). I'm not sure which ways are "right".
> 
> The actual failing tests:
> 
> my $targ = qq{ foobar
> baz  quux
> zot\tfum};
> 
> p6rule_is  ($targ, ':w::baz quux',  'baz\s+quux or baz\s*quux matches');
> p6rule_is  ($targ, ':w::zot fum',   'zot\s+fum or zot\s*fum matches');

Wow, this is a nice test.  I can see why it's failing but I'm not
sure what the correct interpretation should be so I'll be sending
a message to perl6-language for clarification.  I'll explain briefly
below, but for now the solution might be to test C< [:w::baz quux] >.

Briefly, the question has to do with unanchored pattern matches -- in
an unanchored match, there's an implicit C< .*? > at the start of
the match.  So, should  C< rx /::baz quux/ >  act like

    rx /^ .*? ::baz quux /

or

    rx /^ .*? [::baz quux] /

In the first case, the :: ends up negating the .*?, forcing the 
expression to match at the first character.  In the second, the cut
is limited to the subpattern, so the .*? still has a chance to 
shift the pattern across the target string.

At any rate, I'll bring this up on perl6-language, and followups
to this message should probably go there.

Pm

Re: rules trouble

Reply via email to