Jeff 'japhy' Pinyan wrote:
On May 25, Mark A. Biggar said:
Jonathan Scott Duff wrote:
On Tue, May 24, 2005 at 11:24:50PM -0400, Jeff 'japhy' Pinyan wrote:
I wish <!prop X> was allowed. I don't see why <!...> has to be
confined to zero-width assertions.
I don't either actually. One thing that occurred to me while responding
to your original email was that <!foo> might have slightly wrong
huffmanization. Is zero-width the common case? If not, we could use
character doubling for emphasis: <!foo> consumes, while <!!foo> is
zero-width.
Now <prop X> is a character class just like <+digit> and so
under the new character class syntax, would probably be written
<+prop X> or if the white space is a problem, then maybe <+prop:X>
(or <+prop(X)> as Larry gets the colon :-), but that is a pretty
adverbial case so ':' maybe okay) with the complemented case being
<-prop:X>. Actually the 'prop' may be unnecessary at all, as we know
we're in the character class sub-language because we saw the '<+', '<-'
or '<[', so we could just define the various Unicode character property
codes (I.e., Lu, Ll, Zs, etc) as pre-defined character class names just
like 'digit' or 'letter'.
Yeah, that was going to be my next step, except that the unknowing
person might make a sub-rule of their own called, say, "Zs", and then
which would take precedence? Perhaps <prop:X> is a good way of writing it.
Well we have the same problem with someone redefining 'digit'. But
character classes are their own sub-language and we may need to
distinguish between Rule::digit and CharClass::digit in the syntax. Of
course we could hack it and say that a rule that consists of nothing but
a single character class item is usable in other character classes by
its name, but that could lead to subtle bugs where someone modifies that
special rule to add stuff to it and breaks all usage of it as a
character class everywhere else. Now a grammar is just a special kind
of class that contains special kinds of methods called rules, maybe we
need another special kind of method in a grammar that just define a
named character class for later use? In any case as usual with methods
a user define character class should override a predefined one of the
same name.
--
[EMAIL PROTECTED]
[EMAIL PROTECTED]