I'm trying to strip out combining diacritics from some form input using this code:
<head> <META http-equiv="Content-Type" content="text/html; charset=UTF-8"> </head> <body> <form action="test.cgi" accept-charset="UTF-8" method="get"> <input type="text" name="text" value="" size="10"> <input type="submit" value="submit"> </form> </body> </html> #!/usr/local/bin/perl use CGI; $query = CGI::new(); $search_term = $query->param('text'); $sans_diacritics = $search_term; $sans_diacritics =~ s/\p{M}*//g; #$sans_diacritics =~ s/o//g; print qq(Content-type: text/plain; charset=utf-8 $sans_diacritics ); exit(0); In the form, I'm inputting the string "BartoĢk" with the accented character being a base character (small Latin letter "o") followed by a combining acute accent. However, when I print (to the web) $sans_diacritics, I get my input with no change -- the combining diacritic is still there. I know that my input is not a precomposed accented character, because I can strip out the base "o" and the combining accent either stands alone or jumps to another character [2]. The "\p{M}" is a Unicode class name for the character class of Unicode 'marks', for example accent marks [1]. I've tried these variations (and many others) and none seem to be doing what I want: $sans_diacritics =~ s#[\p{Mark}]*##g; $sans_diacritics =~ tr#[\p{InCombiningDiacriticalMarks}]##; $sans_diacritics =~ tr#[\p{M}]##; $sans_diacritics =~ s/\p{M}*//g; $sans_diacritics =~ s#[\p{M}]##g; $sans_diacritics =~ s#\x{0301}##g; $sans_diacritics =~ s#\x{006F}\x{0301}##g; $sans_diacritics =~ s#[\x{0300}-\x{036F}]*##g; I'm pulling my hair out on this... so any help would be appreciated. If there's any other info I can provide, let me know. My Perl version is 5.8.8 and the script is running on a server running Solaris 9. -- Michael [1] per http://perldoc.perl.org/perlretut.html and other documentation [2] using $sans_diacritics =~ s/o//g; # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 mobile # [EMAIL PROTECTED] # http://rocky.uta.edu/doran/