Re: character encoding & regex

2007-06-17 Thread Dr.Ruud
Tom Allison schreef: > I don't require actual character comparison, comparison of \{263a} is > sufficient. A Perl string contains characters (not octets). The codepoint U+263a is represented by the character "\x{263a}". Whether that takes 1 or 2 or 3 or even more octets in the string, shouldn't m

Re: character encoding & regex

2007-06-17 Thread Tom Allison
I got somewhere with this: From: =?Big5?B?obS2Uq/5r3Wk6KtLLLKjpmGqvbBl?= <> translates to From: \\{a1}\\{b4}\\{b6}R\\{af}\\{f9}\\{af}u\\{a4}\\{e8}\\{ab}K,\\{b2} \\{a3}\\{a6}a\\{aa}\\{bd}\\{b0}e <> which still means nothing to me. But at least I can pick it apart, I think. I want to match ev

Re: character encoding & regex

2007-06-17 Thread Tom Allison
On Jun 17, 2007, at 6:14 AM, Dr.Ruud wrote: Tom Allison schreef: I'm trying to do some regular expression on strings in email. They could be encoded to something. But I can't tell because I don't have a utf8 unicode xterm window that will show me anything. There are more simple ways to fi

Re: character encoding & regex

2007-06-17 Thread Dr.Ruud
Tom Allison schreef: > I'm trying to do some regular expression on strings in email. They > could be encoded to something. But I can't tell because I don't have > a utf8 unicode xterm window that will show me anything. There are more simple ways to find out, see charnames and perlunitut. http://

Re: character encoding & regex

2007-06-16 Thread Mumia W.
On 06/16/2007 05:01 PM, Tom Allison wrote: Mumia W. wrote: On 06/16/2007 02:29 PM, Tom Allison wrote: I'm trying to do some regular expression on strings in email. [...] And with unicode and locales and bytes it all gets extremely ugly. I found something that SpamAssassin uses to convert all

Re: character encoding & regex

2007-06-16 Thread Tom Allison
On Jun 16, 2007, at 6:05 PM, Tom Phoenix wrote: On 6/16/07, Tom Allison <[EMAIL PROTECTED]> wrote: I'm trying to do some regular expression on strings in email. They could be encoded to something. But I can't tell because I don't have a utf8 unicode xterm window that will show me anythin

Re: character encoding & regex

2007-06-16 Thread Tom Phoenix
On 6/16/07, Tom Allison <[EMAIL PROTECTED]> wrote: I'm trying to do some regular expression on strings in email. They could be encoded to something. But I can't tell because I don't have a utf8 unicode xterm window that will show me anything. At best I get ?a?? and other trash like that.

Re: character encoding & regex

2007-06-16 Thread Tom Allison
Mumia W. wrote: On 06/16/2007 02:29 PM, Tom Allison wrote: I'm trying to do some regular expression on strings in email. They could be encoded to something. But I can't tell because I don't have a utf8 unicode xterm window that will show me anything. At best I get ?a?? and other trash

Re: character encoding & regex

2007-06-16 Thread Mumia W.
On 06/16/2007 02:29 PM, Tom Allison wrote: I'm trying to do some regular expression on strings in email. They could be encoded to something. But I can't tell because I don't have a utf8 unicode xterm window that will show me anything. At best I get ?a?? and other trash like that. I thin

character encoding & regex

2007-06-16 Thread Tom Allison
I'm trying to do some regular expression on strings in email. They could be encoded to something. But I can't tell because I don't have a utf8 unicode xterm window that will show me anything. At best I get ?a?? and other trash like that. I think this is typical for ascii text renderings