Re: [svn:perl6-synopsis] r14449 - doc/trunk/design/syn
On Thu, Sep 06, 2007 at 05:12:03PM -0700, [EMAIL PROTECTED] wrote: > Log: > old is now <+foo> to suppress capture > new now is zero-width like I really like the change from to <+foo>, but I think there's a conflict (or at least some confusion) in the way the new spec is worded, especially as it relates to character class sets. Both old and new versions of S05 say: If the first character after the identifier is whitespace, the subsequent text (following any whitespace) is passed as a regex, so is more or less equivalent to . In the previous version of S05, the non-capturing form of would be . Here, the whitespace after "foo" indicated that "bar" was to be parsed and passed to foo as a regex. In the new version of S05, the non-capturing form of would seem to be <+foo bar>. Okay, I can handle that. However, S05 also says that " can be written as <+ foo + bar - baz> ". Presumably this second form would also allow "<+foo + bar - baz>", which seems to conflict slightly with the notion that <+foo bar> is the non-capturing form of . In other words, the whitespace character following "<+foo" doesn't seem to be sufficient to indicate how the remainder is to be processed -- we have to look beyond the whitespace for a leading plus or minus. Perhaps S05 is addressing this when it says An initial identifier is taken as a character class, so the first character after the identifier doesn't matter in this case, and you can use whitespace however you like. Here I find this wording very unclear -- it doesn't tell me what is distinguishing the "doesn't matter in this case" part between <+foo + bar> and <+foo bar>. Since the S05 spec has changed so that all punctuation is meta, I'm thinking we may be able to simplify the spec altogether. Previously the "whitespace following the identifier" was used to distinguish from , or from . Since it's now effectively impossible for a regex to begin with a bare plus or minus character, we may be able to alter the "whitespace following identifier" wording such that and are identical. Perhaps something like: - if the character following the identifier is a left paren, it's a call <+foo('bar')> - if the character following the identifier is a colon, the rest of the text (following any whitespace) is passed as a string # same as <+foo: bar> - if the identifier is followed by a plus or minus (with optional intervening whitespace), it's a set of character classes # same thing <+foo + baz - bar> # also the same - anything else following whitespace is a regex to be passed # same as <+foo bar> # same as <+foo(/bar/)> # same as Pm
Re: [svn:perl6-synopsis] r14449 - doc/trunk/design/syn
[EMAIL PROTECTED] writes: > -A leading C<[> or C<+> indicates an enumerated character class. Ranges > +A leading C<[> indicates an enumerated character class. Ranges > in enumerated character classes are indicated with "C<..>" rather than > "C<->". > > / <[a..z_]>* / > - / <+[a..z_]>* / > - / <+[ a..z _ ]>* / > - / <+ [ a .. z _ ] >* / > > Whitespace is ignored within square brackets and after the initial C<+>. Did you mean to remove "and after the initial C<+>" as well? > + / <[ a..z _ ]>* / > + Simon
Re: [svn:perl6-synopsis] r14449 - doc/trunk/design/syn
Some other minor notes about the S05.pod update: > +In particular, also matches the null string, and always fails. Perhaps these should be quoted with "C<< ... >>" so that it's clear that "" and "" are the tokens? When looking at the .pod file I had to think about it a couple of times to make sure that it wasn't intending C and C. > +Any atom that is quantified with a minimally match (using the C modifier). s/minimally/minimal/ > +Greedy quantifiers and characters classes do not terminate a token pattern. s/characters/character/ Thanks, Pm
Re: [svn:perl6-synopsis] r14449 - doc/trunk/design/syn
On Fri, Sep 07, 2007 at 02:45:52AM -0500, Patrick R. Michaud wrote: : On Thu, Sep 06, 2007 at 05:12:03PM -0700, [EMAIL PROTECTED] wrote: : > Log: : > old is now <+foo> to suppress capture : > new now is zero-width like : : I really like the change from to <+foo>, but I think there's : a conflict (or at least some confusion) in the way the new spec is : worded, especially as it relates to character class sets. I'm actually still of two minds whether it's proper to overload <+foo> like that, and what we end up with may well depend on revisions to the binding syntax. But it can be <+foo> for now, assuming we can deal with the ambiguities you point out. 'Course, by the time we're done with that, we might well decide <+foo> is a bad plan... : Both old and new versions of S05 say: : : If the first character after the identifier is whitespace, the : subsequent text (following any whitespace) is passed as a regex, : so is more or less equivalent to . : : In the previous version of S05, the non-capturing form of : would be . Here, the whitespace after "foo" indicated : that "bar" was to be parsed and passed to foo as a regex. : : In the new version of S05, the non-capturing form of : would seem to be <+foo bar>. Okay, I can handle that. However, : S05 also says that " can be written as <+ foo + bar - baz> ". : Presumably this second form would also allow "<+foo + bar - baz>", : which seems to conflict slightly with the notion that <+foo bar> : is the non-capturing form of . In other words, the : whitespace character following "<+foo" doesn't seem to be : sufficient to indicate how the remainder is to be processed -- : we have to look beyond the whitespace for a leading plus or minus. If we stick with +, one approach might be to simply disallow whitespace in composite character classes. : Perhaps S05 is addressing this when it says : : An initial identifier is taken as a character class, so the : first character after the identifier doesn't matter in this : case, and you can use whitespace however you like. : : Here I find this wording very unclear -- it doesn't tell me : what is distinguishing the "doesn't matter in this case" part : between <+foo + bar> and <+foo bar>. What, me unclear? How could that happen? :-) [Don't answer that...] : Since the S05 spec has changed so that all punctuation is meta, : I'm thinking we may be able to simplify the spec altogether. : Previously the "whitespace following the identifier" was : used to distinguish from , or : from . Since it's now effectively impossible for : a regex to begin with a bare plus or minus character, we may be : able to alter the "whitespace following identifier" wording such : that and are identical. Perhaps : something like: : : - if the character following the identifier is a left paren, : it's a call : : : <+foo('bar')> : : : - if the character following the identifier is a colon, the rest : of the text (following any whitespace) is passed as a string : : # same as : <+foo: bar> : : : - if the identifier is followed by a plus or minus (with optional : intervening whitespace), it's a set of character classes : : : # same thing : <+foo + baz - bar> # also the same : : - anything else following whitespace is a regex to be passed : : # same as : <+foo bar> # same as <+foo(/bar/)> : # same as That's assuming we don't define any metasyntax that starts with + or - in the future, such as bare +[ a..z ], or +[ ...] as a variant of [...]+. And while we could resolve the ambiguity of the second + by fiat, it would probably be better if the ambiguity didn't arise in the first place. If <+foo ...> is going to change the parsing of ... at all, then it should probably do so consistenly, which means <+foo> is really a bad plan. (Also, there are already too many +'s in patterns.) So while it's cute to generalize <+foo> to "establish the initial universal set of matches", I suspect it's likely to change to something else. Possibilities I've been mulling: <~ws> # "I just want to match as a string" <\ws> # "Don't do the normal thing with the following" <.ws> # "Just call the ws method" <=ws> # "Bind to nothing", assuming binds $ Damian points out that it's a little strange for = to enable binding in the case but disable it in the <=ws> case. It would be possible to make <=ws> mean and not capture at all. Offhand I'd say that would be bad huffmanization, but I need to look at STD some more. It also depends on any post-binding syntax resembling: -> $foo {...} and whether that is deemed preferable to or $foo= or whatever. (One nice thing about the post syntax is that we could know for sure that we're creating a new
Re: [svn:perl6-synopsis] r14449 - doc/trunk/design/syn
> Other available chars: > > <`ws> > <^ws> > <&ws> > <*ws> > <-ws> > <|ws> > <:ws> > <;ws> > I'd vote for <:ws> which is vaguely reminiscent of the former non-capturing parens (?:). It (<:ws>) also bears little similarity to any other regex construct - although it looks a bit like a Perl 6 pair. Paul
Re: [svn:perl6-synopsis] r14449 - doc/trunk/design/syn
On Fri, Sep 07, 2007 at 04:05:55PM -0600, Paul Seamons wrote: : > Other available chars: : > : > <`ws> : > <^ws> : > <&ws> : > <*ws> : > <-ws> I forgot we're using - already, so scratch that one... : > <|ws> : > <:ws> : > <;ws> : > : : I'd vote for <:ws> which is vaguely reminiscent of the former non-capturing : parens (?:). I'm not sure a resemblance to P5 syntax is really a recommendation... :) : It (<:ws>) also bears little similarity to any other regex construct - : although it looks a bit like a Perl 6 pair. Which might be a good argument for reserving the syntax for real pairs somehow. Also, pairs have special arguments, and people would wonder what <:foo(...)> <:foo[...]>, <:foo{...}> and <:foo<...>> mean. Not to mention <:!foo>. I should have pointed out that I think all the candidates from the last list are long shots for various reasons. looks like a closing tag. <*ws> is visually confusing with other * usages, and while <^ws> implies some kind of negation culturally, it's a form of negation we're trying to get away from, in favor of consistently using !. The first list is the ones I'm really considering, and of those, <.ws> is the easiest to type and gets out of the way of identifier visually. It also looks like a method call, which in fact it is. <~ws> is hard to type, and <\ws> can be confused with \w. The problem with <=foo> I already mentioned. The only strangeness about <.foo> I see is that arguments would presumably continue to parse like like ordinary assertions: <.foo bar> and <.foo: bar> might be misread. I dunno, maybe <\ws> isn't so bad... Larry
Re: [svn:perl6-synopsis] r14449 - doc/trunk/design/syn
On Fri, Sep 07, 2007 at 03:50:09PM -0700, Larry Wall wrote: > The first list is the ones I'm really considering, and of those, <.ws> > is the easiest to type and gets out of the way of identifier visually. > It also looks like a method call, which in fact it is. <~ws> is hard > to type, and <\ws> can be confused with \w. The problem with <=foo> > I already mentioned. The only strangeness about <.foo> I see is that > arguments would presumably continue to parse like like ordinary > assertions: <.foo bar> and <.foo: bar> might be misread. > > I dunno, maybe <\ws> isn't so bad... But as soon as I saw it I thought the same as you say in the paragraph above - in the context of a regexp (or string) \ makes me think that one character is being back-whacked, rather than it applying to the entire token. I suspect my brain will think of rules like regexps. (But I could be wrong, and unlike quite a few people on this list, I've not written any yet, so my opinion might be of little value) Nicholas Clark
Re: [svn:perl6-synopsis] r14449 - doc/trunk/design/syn
On Sat, Sep 08, 2007 at 12:12:10AM +0100, Nicholas Clark wrote: : On Fri, Sep 07, 2007 at 03:50:09PM -0700, Larry Wall wrote: : > I dunno, maybe <\ws> isn't so bad... : : But as soon as I saw it I thought the same as you say in the paragraph above - : in the context of a regexp (or string) \ makes me think that one character is : being back-whacked, rather than it applying to the entire token. : : I suspect my brain will think of rules like regexps. (But I could be wrong, : and unlike quite a few people on this list, I've not written any yet, so my : opinion might be of little value) Well, we could go off in a TeXish direction and say that \foo is a non-capturing , and \w, \d, etc. are just , , etc. Then your whitespace is just \ws, and your word boundary is just \wb. That would simplify how you define your own \w sequences as well. \xfe gets a little problematic under that view though, unless we require all rules starting with x to be called . Or require people to use \x[fe], which also kinda sux. Larry
Re: [svn:perl6-synopsis] r14449 - doc/trunk/design/syn
On Fri, Sep 07, 2007, Larry Wall writes: > If we stick with +, one approach might be to simply disallow whitespace > in composite character classes. Of the choices presented thus far, I like this one the best. Although I did like being able to stick whitespace in the character classes for readability, such that losing the whitespace in <+foo - [Jj] > would be a disappointment -- I still like <+foo> as much as the other alternatives. Even if we decide that <+foo> isn't the official non-capturing syntax, we still have the case that <+foo> is effectively a non-capturing form of . I sorta liked that we were reducing two syntaxes for the same thing ( and <+foo> ) down to one, so adding one back in feels funny. I do agree that we may be getting a few too many +'s in our patterns. However, having just converted several grammars in Parrot languages to use the new <+foo> syntax, I was surprised at how few there actually were. And many of the existing cases where I had previously used didn't really change (or need to change), because they were already zero-width things such as , , , etc., and I felt it made more sense to keep the syntax anyway. Of the non-<+foo> options given thus far, I like <~foo> and <.foo> (in that order). I don't find ~ all that hard to type -- after all, we use the tilde quite frequently in things like Unix's "~username" syntax, in Perl 5's =~ operator, and even in Perl 6 with the ~~ smart match operator. Perhaps I would feel differently about tilde if I were on a non-US keyboard. I agree that <:foo> should probably be reserved for something having to do with pairs or adverbs. I'm not at all a fan of <\ws>. Anyway, those are my reactions, for whatever they're worth. Pm
Re: [svn:perl6-synopsis] r14449 - doc/trunk/design/syn
On Fri, Sep 07, 2007 at 04:05:55PM -0600, Paul Seamons wrote: > I'd vote for <:ws> which is vaguely reminiscent of the former non-capturing > parens (?:). > > It (<:ws>) also bears little similarity to any other regex construct - > although it looks a bit like a Perl 6 pair. For completeness it may be worth pointing out that :i, :s, and :Perl5 are in fact valid regex constructs. :-) Pm