On Thu, Sep 06, 2007 at 05:12:03PM -0700, [EMAIL PROTECTED] wrote: > Log: > old <?foo> is now <+foo> to suppress capture > new <?foo> now is zero-width like <!foo>
I really like the change from <?foo> to <+foo>, but I think there's a conflict (or at least some confusion) in the way the new spec is worded, especially as it relates to character class sets. Both old and new versions of S05 say: If the first character after the identifier is whitespace, the subsequent text (following any whitespace) is passed as a regex, so <foo bar> is more or less equivalent to <foo(/bar/)> . In the previous version of S05, the non-capturing form of <foo bar> would be <?foo bar>. Here, the whitespace after "foo" indicated that "bar" was to be parsed and passed to foo as a regex. In the new version of S05, the non-capturing form of <foo bar> would seem to be <+foo bar>. Okay, I can handle that. However, S05 also says that " <foo+bar-baz> can be written as <+ foo + bar - baz> ". Presumably this second form would also allow "<+foo + bar - baz>", which seems to conflict slightly with the notion that <+foo bar> is the non-capturing form of <foo bar>. In other words, the whitespace character following "<+foo" doesn't seem to be sufficient to indicate how the remainder is to be processed -- we have to look beyond the whitespace for a leading plus or minus. Perhaps S05 is addressing this when it says An initial identifier is taken as a character class, so the first character after the identifier doesn't matter in this case, and you can use whitespace however you like. Here I find this wording very unclear -- it doesn't tell me what is distinguishing the "doesn't matter in this case" part between <+foo + bar> and <+foo bar>. Since the S05 spec has changed so that all punctuation is meta, I'm thinking we may be able to simplify the spec altogether. Previously the "whitespace following the identifier" was used to distinguish <foo-bar> from <foo -bar>, or <alpha-[Jj]> from <alpha -[Jj]>. Since it's now effectively impossible for a regex to begin with a bare plus or minus character, we may be able to alter the "whitespace following identifier" wording such that <foo-bar> and <foo - bar> are identical. Perhaps something like: - if the character following the identifier is a left paren, it's a call <foo('bar')> <+foo('bar')> <!foo('bar')> - if the character following the identifier is a colon, the rest of the text (following any whitespace) is passed as a string <foo: bar> # same as <foo('bar')> <+foo: bar> <!foo: bar> - if the identifier is followed by a plus or minus (with optional intervening whitespace), it's a set of character classes <foo+baz-bar> <foo + baz - bar> # same thing <+foo + baz - bar> # also the same - anything else following whitespace is a regex to be passed <foo bar> # same as <foo(/bar/)> <+foo bar> # same as <+foo(/bar/)> <!foo bar> # same as <!foo(/bar/)> Pm