[My reply below likely belongs on either perl6-compiler or perl6-language, but I didn't want to do a lot of cross-posting, so I'm replying to perl6-internals for now (with apologies to p6i) and followups should probably go to p6c or p6l. --Pm]
On Thu, May 12, 2005 at 01:51:04AM -0400, Dino Morelli wrote: > I'm working on more p6rules unit tests. > > Having some trouble. First, understanding when :w means \s* and when it > means \s+ I'll do my best to explain. From A05, <ws> means \s+ whenever it's between two identifiers (i.e., two sets of word characters) and \s* between anything else. Furthermore, according to S05, <ws> decides this based on the contents of the matched string, not the pattern being matched. Thus, a pattern like rx :w /hello -?world/ becomes rx /hello <?ws> -?world/ which matches any of hello world hello-world hello -world hello world hello\nworld but not helloworld Thus, <ws> fails if it occurs between two word characters in the target string, and it greedily consumes any whitespace at that point in the match. We might speculate that <ws> is equivalent to \b\s*, but \b fails between pairs of non-word characters, whereas <ws> will succeed. Followups on this question should probably go to p6c or p6l. > Also, these tests are failing when I use :: to separate the modifier > from the pattern. But they work when I do ':w blah' (separate with a > space). I'm not sure which ways are "right". > > The actual failing tests: > > my $targ = qq{ foobar > baz quux > zot\tfum}; > > p6rule_is ($targ, ':w::baz quux', 'baz\s+quux or baz\s*quux matches'); > p6rule_is ($targ, ':w::zot fum', 'zot\s+fum or zot\s*fum matches'); Wow, this is a nice test. I can see why it's failing but I'm not sure what the correct interpretation should be so I'll be sending a message to perl6-language for clarification. I'll explain briefly below, but for now the solution might be to test C< [:w::baz quux] >. Briefly, the question has to do with unanchored pattern matches -- in an unanchored match, there's an implicit C< .*? > at the start of the match. So, should C< rx /::baz quux/ > act like rx /^ .*? ::baz quux / or rx /^ .*? [::baz quux] / In the first case, the :: ends up negating the .*?, forcing the expression to match at the first character. In the second, the cut is limited to the subpattern, so the .*? still has a chance to shift the pattern across the target string. At any rate, I'll bring this up on perl6-language, and followups to this message should probably go there. Pm