Tom Allison schreef:
> I want to break up every email based on a token defined as
> /(\w\w\w+)/g; This will give me every "word" of three or more letters.
Alternative:
/(\w{3,})/g
> But when I'm getting mail that is in UTF-8 format this doesn't work
> that way I want it to as I can't see an umlat (or similar) as
> matching a '\w'.
First read perlunitut:
http://juerd.nl/perlunitut.html
(which of course has a SEE ALSO section at the end)
More fun with \w etc.:
http://www.xs4all.nl/~rvtol/perl/unicount.pl
http://www.xs4all.nl/~rvtol/perl/unicount-WL.pl
--
Affijn, Ruud
"Gewoon is een tijger."
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/