On Tue, May 18, 2021 at 03:04:12PM +0200, Marco wrote: > > Hello Henrik, > > thank you for the hints. I didn't realized that SA doesn't support UTF8 > regex. Well. As you suggest, I would like to write rules coding independent > in order to avoid surprises. I tried, it doesn't work... > > I have normalize_charset 1. > My text body is "Ciao, è proprio eccoci là si fa\nciao" > > With > ([\d\S\x{00E0}\x{c3a0}\x{00E8}\x{c3a8}\x{00EC}\x{c3ac}\x{00F2}\x{c3b2}\x{00F9}\x{c3b9}\x{00C0}\x{c380}\x{00C8}\x{c388}\x{00CC}\x{c38c}\x{00D2}\x{c392}\x{00D9}\x{c399}]+)
This is still UTF8/Unicode format: \x{xxxx} https://www.fileformat.info/info/unicode/char/00e0/index.htm Instead of \x{00E0}, you need to use \xC3\xA0 as you are matching _separate_ raw bytes. (untested, but assuming so from the url, too busy to test)