On Tue, Jun 29, 2004 at 08:34:16AM -0700, Austin Hastings wrote: > This has no direct bearing on p6l, since performance is a p6i issue. > But perhaps in the interests of performance as well as hackery we > should explicitly provide some sort of variant regex behavior: > > /a./ :bytes > /a./ :graphemes > > where the first would recognize 0x61 followed by any single byte, while > the second would recognize 'a' followed by any number of bytes > composing a single grapheme.
Isn't that what :u0, :u1, :u2, and :u3 are for? :u0 # use bytes (. is byte) :u1 # level 1 support (. is codepoint) :u2 # level 1 support (. is grapheme) :u3 # level 1 support (. is language dependent) These modifiers say nothing about the state of the data, but in general internal Perl data will already be in Normalization Form C, so even under :u1, the precomposed characters will usually do the right thing. Note that these modifiers are for overriding the default support level, which was probably set by pragma at the top of the file. Or was that to imply that a literal "a" in the RE would be interpretted as a "grapheme a" when :u2 is active? -Scott -- Jonathan Scott Duff Division of Nearshore Research [EMAIL PROTECTED] Senior Systems Analyst II