On Tue, Jun 29, 2004 at 08:34:16AM -0700, Austin Hastings wrote:
> This has no direct bearing on p6l, since performance is a p6i issue.
> But perhaps in the interests of performance as well as hackery we
> should explicitly provide some sort of variant regex behavior:
> 
>     /a./ :bytes
>     /a./ :graphemes
> 
> where the first would recognize 0x61 followed by any single byte, while
> the second would recognize 'a' followed by any number of bytes
> composing a single grapheme.

Isn't that what :u0, :u1, :u2, and :u3 are for?

            :u0         # use bytes       (. is byte)
            :u1         # level 1 support (. is codepoint)
            :u2         # level 1 support (. is grapheme)
            :u3         # level 1 support (. is language dependent)

        These modifiers say nothing about the state of the data, but in
        general internal Perl data will already be in Normalization Form
        C, so even under :u1, the precomposed characters will usually do
        the right thing. Note that these modifiers are for overriding
        the default support level, which was probably set by pragma at
        the top of the file.

Or was that to imply that a literal "a" in the RE would be
interpretted as a "grapheme a" when :u2 is active?

-Scott
-- 
Jonathan Scott Duff                     Division of Nearshore Research
[EMAIL PROTECTED]               Senior Systems Analyst II

Reply via email to