RE: Stripping out Unicode combining characters (diacritics)

2008-05-06 Thread Doran, Michael D
Hi Leif, > This is what I do. You can try that. > See if it helps: > > Encode::_utf8_on($str); # <<< > $str =~ s/\pM*//g; That works! I will gladly buy the beers Leif, should we ever meet in person. > I mean - have you for instance tried running your cgi scripts > in tainted mode (-T)? No,

Re: Stripping out Unicode combining characters (diacritics)

2008-05-06 Thread Leif Andersson
I've been doing it like Mike R suggested for quite some while. But some characters do not map nicely into this scheme. So you may want to manually take care of stuff like german eszet, ligature oe etc, etc. s/\x{00df}/ss/g; s/\x{0152}/Oe/g; s/\x{0153}/oe/g; ...to be continued... Leif ==