On Wed, 30 May 2012 08:26:44 -0700
jdow <j...@earthlink.net> wrote:

> I'm idly wondering what affect this would have on the time to scan a
> single email.

Actually converting from the original encoding to UTF-8 is very fast.
Internally, Perl uses pretty fast C code to convert between character
encodings.

As for Unicode regexes, I think they're pretty efficient in Perl.  We
added UTF-8 support to our Bayes tokenizer and we use some pretty
hairy regexes to pick out tokens (handling CJK glyphs is interesting.)
Performance seems decent enough.

Regards,

David.

Reply via email to