On 10/13/2019 2:54 PM, Richard
Wordingham via Unicode wrote:
Besides invalidating complexity metrics, the issue was what \p{Lu} should match. For example, with PCRE syntax, GNU grep Version 2.25 \p{Lu} matches U+0100 but not <A, U+0300>. When I'm respecting canonical equivalence, I want both to match [:Lu:], and that's what I do. [:Lu:] can then match a sequence of up to 4 NFD characters. Formally, wouldn't that be rewriting \p{Lu} to match \p{Lu}\p{Mn}*; instead of formally handling NFD, you could extend the syntax to handle "inherited" properties across combining sequences. Am I missing anything? A./ |
- Re: Pure Regular Expression Engines and Lit... Hans Åberg via Unicode
- Re: Pure Regular Expression Engines and Lit... Richard Wordingham via Unicode
- Re: Pure Regular Expression Engines and Lit... Hans Åberg via Unicode
- Re: Pure Regular Expression Engines and Lit... Richard Wordingham via Unicode
- Re: Pure Regular Expression Engines and Lit... Hans Åberg via Unicode
- Re: Pure Regular Expression Engines and Lit... Richard Wordingham via Unicode
- Re: Pure Regular Expression Engines and Lit... Hans Åberg via Unicode
- Re: Pure Regular Expression Engines and Lit... Richard Wordingham via Unicode
- Re: Pure Regular Expression Engines and Lit... Eli Zaretskii via Unicode
- Re: Pure Regular Expression Engines and Lit... Hans Åberg via Unicode
- Re: Pure Regular Expression Engines and Lit... Asmus Freytag via Unicode
- Re: Pure Regular Expression Engines and Lit... Richard Wordingham via Unicode
- Re: Pure Regular Expression Engines and Lit... Asmus Freytag via Unicode
- Re: Pure Regular Expression Engines and Lit... Mark Davis ☕️ via Unicode
- Re: Pure Regular Expression Engines and Lit... Richard Wordingham via Unicode
- Re: Pure Regular Expression Engines and Lit... Richard Wordingham via Unicode