Re: Pure Regular Expression Engines and Literal Clusters

Asmus Freytag via Unicode Sun, 13 Oct 2019 17:17:10 -0700

On 10/13/2019 2:54 PM, Richard Wordingham via Unicode wrote:

Besides invalidating complexity metrics, the issue was what \p{Lu}
should match.  For example, with PCRE syntax, GNU grep Version 2.25
\p{Lu} matches U+0100 but not <A, U+0300>.  When I'm respecting
canonical equivalence, I want both to match [:Lu:], and that's what I
do. [:Lu:] can then match a sequence of up to 4 NFD characters.

Formally, wouldn't that be rewriting \p{Lu} to match \p{Lu}\p{Mn}*; instead of formally handling NFD, you could extend the syntax to handle "inherited" properties across combining sequences.

Am I missing anything?

A./

Re: Pure Regular Expression Engines and Literal Clusters

Reply via email to