On Thu, Sep 06, 2007 at 05:12:03PM -0700, [EMAIL PROTECTED] wrote:
> Log:
> old <?foo> is now <+foo> to suppress capture
> new <?foo> now is zero-width like <!foo>

I really like the change from <?foo> to <+foo>, but I think there's
a conflict (or at least some confusion) in the way the new spec is
worded, especially as it relates to character class sets.

Both old and new versions of S05 say:

    If the first character after the identifier is whitespace, the
    subsequent text (following any whitespace) is passed as a regex, 
    so   <foo bar>   is more or less equivalent to   <foo(/bar/)>  .

In the previous version of S05, the non-capturing form of <foo bar>
would be <?foo bar>.  Here, the whitespace after "foo" indicated
that "bar" was to be parsed and passed to foo as a regex.

In the new version of S05, the non-capturing form of <foo bar>
would seem to be <+foo bar>.  Okay, I can handle that.  However, 
S05 also says that " <foo+bar-baz> can be written as <+ foo + bar - baz> ".
Presumably this second form would also allow "<+foo + bar - baz>",
which seems to conflict slightly with the notion that <+foo bar>
is the non-capturing form of <foo bar>.  In other words, the
whitespace character following "<+foo" doesn't seem to be
sufficient to indicate how the remainder is to be processed --
we have to look beyond the whitespace for a leading plus or minus.

Perhaps S05 is addressing this when it says 

    An initial identifier is taken as a character class, so the 
    first character after the identifier doesn't matter in this 
    case, and you can use whitespace however you like.

Here I find this wording very unclear -- it doesn't tell me 
what is distinguishing the "doesn't matter in this case" part
between <+foo + bar> and <+foo bar>.    

Since the S05 spec has changed so that all punctuation is meta, 
I'm thinking we may be able to simplify the spec altogether.
Previously the "whitespace following the identifier" was
used to distinguish <foo-bar> from <foo -bar>, or <alpha-[Jj]>
from <alpha -[Jj]>.  Since it's now effectively impossible for 
a regex to begin with a bare plus or minus character, we may be
able to alter the "whitespace following identifier" wording such
that <foo-bar> and <foo - bar> are identical.  Perhaps
something like:

  - if the character following the identifier is a left paren,
    it's a call

        <foo('bar')>
        <+foo('bar')>
        <!foo('bar')>

  - if the character following the identifier is a colon, the rest
    of the text (following any whitespace) is passed as a string

        <foo: bar>             # same as <foo('bar')>
        <+foo: bar>
        <!foo: bar>

  - if the identifier is followed by a plus or minus (with optional
    intervening whitespace), it's a set of character classes

        <foo+baz-bar>
        <foo + baz - bar>      # same thing
        <+foo + baz - bar>     # also the same

  - anything else following whitespace is a regex to be passed

        <foo bar>              # same as <foo(/bar/)>
        <+foo bar>             # same as <+foo(/bar/)>
        <!foo bar>             # same as <!foo(/bar/)>

Pm

Reply via email to