Tom Allison schreef:

> I want to break up every email based on a token defined as
> /(\w\w\w+)/g; This will give me every "word" of three or more letters.

Alternative:

  /(\w{3,})/g

> But when I'm getting mail that is in UTF-8 format this doesn't work
> that way I want it to as I can't see an umlat (or similar) as
> matching a '\w'.

First read perlunitut:
http://juerd.nl/perlunitut.html
(which of course has a SEE ALSO section at the end)

More fun with \w etc.:
http://www.xs4all.nl/~rvtol/perl/unicount.pl
http://www.xs4all.nl/~rvtol/perl/unicount-WL.pl

-- 
Affijn, Ruud

"Gewoon is een tijger."


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to