Gunnar Hjalmarsson wrote:
Stanisław T. Findeisen wrote:
Hi how to write regular expressions matching against Unicode (eg., UTF-8) strings?

For instance, in my regexp:

qr/^([.<>@ \w])*$/

Decode the UTF-8 encoded strings before applying the regex on them.

$ perl -MEncode -le '
$utf8_encoded = "smörgåsbord";
$s = decode "UTF-8", $utf8_encoded;
print "Match" if $s =~ /^\w+$/;
'
Match
$

Thanks, decode helped with this. But can I ask you one more question? What assumptions does Perl make regarding input file (i.e., the program/script file) encoding?

Is it so that string literals in Perl are byte arrays in fact? What you type is what you get?

STF

=======================================================================
http://eisenbits.homelinux.net/~stf/
OpenPGP: DFD9 0146 3794 9CF6 17EA  D63F DBF5 8AA8 3B31 FE8A
=======================================================================

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to