On Tue, May 18, 2021 at 03:04:12PM +0200, Marco wrote:
>
> Hello Henrik,
> 
>   thank you for the hints. I didn't realized that SA doesn't support UTF8
> regex. Well. As you suggest, I would like to write rules coding independent
> in order to avoid surprises. I tried, it doesn't work...
> 
> I have normalize_charset 1.
> My text body is "Ciao, è proprio eccoci là si fa\nciao"
> 
> With
> ([\d\S\x{00E0}\x{c3a0}\x{00E8}\x{c3a8}\x{00EC}\x{c3ac}\x{00F2}\x{c3b2}\x{00F9}\x{c3b9}\x{00C0}\x{c380}\x{00C8}\x{c388}\x{00CC}\x{c38c}\x{00D2}\x{c392}\x{00D9}\x{c399}]+)

This is still UTF8/Unicode format: \x{xxxx}

https://www.fileformat.info/info/unicode/char/00e0/index.htm

Instead of \x{00E0}, you need to use \xC3\xA0 as you are matching _separate_
raw bytes. (untested, but assuming so from the url, too busy to test)

Reply via email to