On Fri, 26 Sep 2014, dar...@chaosreigns.com wrote:
I wrote a script that takes a list of words with UTF-8 characters, and
generates rules matching them:
http://chaosreigns.com/code/dl/sawordrule.pl
For example:
$ echo "anĂ¡lisis" | perl ./sawordrule.pl SPANISH_
body SPANISH_ANALISIS /\ban[\x{C1}\x{E1}]lisis\b/i # anĂ¡lisis
How do you get a one byte match for two-byte-long UTF-8-encoded accented
characters? Shouldn't it generate this:
/\ban[\xc3][\xa1]lisis\b/i
I didn't think normalization had been implemented yet.
Your rule doesn't hit in my test environment (though I just pasted that
word into an existing message to test...)
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
How do you argue with people to whom math is an opinion? -- Unknown
-----------------------------------------------------------------------
848 days since the first successful private support mission to ISS (SpaceX)