On Jun 17, 2007, at 6:14 AM, Dr.Ruud wrote:
Tom Allison schreef:
I'm trying to do some regular expression on strings in email. They
could be encoded to something. But I can't tell because I don't have
a utf8 unicode xterm window that will show me anything.
There are more simple ways to find out, see charnames and perlunitut.
http://search.cpan.org/perldoc?charnames
http://search.cpan.org/perldoc?perlunitut
I would first convert to a common base, like UTF-8, before trying to
match strings. Are you talking about raw mail messages? Consider
SpamAssassin and custom rules.
I don't require actual character comparison, comparison of \{263a} is
sufficient.
And it's rather difficult to determine in raw email what the correct
charset is to use for each string. I find that email sometimes
passes multiple encodings in one message making it more difficult to
pick apart.
The point that I'm coming from is post MIME::Parse which does a good
job of parsing out messages but I'm not sure how to manage the
decoding in every case. It's hard to find good examples sometimes.
As for SpamAssassin. I'm trying to stay away from that because it's
very large and from a development perspective -- badly documented in
the code. Basically, SpamAssassin is capable for what it does, but I
don't exactly want to do that. Similar, yet, but not exactly.
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/