RE: still working with utf8

2007-06-22 Thread Bob McConnell
> -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: Friday, June 22, 2007 8:36 AM > To: [EMAIL PROTECTED]; beginners@perl.org; > Mumia W.; Beginners List > Subject: Re: still working with utf8 > > > >Yes, be prepared for

Re: still working with utf8

2007-06-22 Thread tom
On 6/22/2007, "Tom Phoenix" <[EMAIL PROTECTED]> wrote: > >On 6/21/07, Tom Allison <[EMAIL PROTECTED]> wrote: > >> I guess my question is, for CJK languages, should I expect the notion >> of using a regex like \w+ to pick up entire strings of text instead >> of discrete words like latin bas

Re: still working with utf8

2007-06-22 Thread tom
>Yes, be prepared for the fact that not all foreign languages will >support the concept of spaces between words. I don't know anything about >Japanese, but I do vaguely remember from high school that, for Chinese >texts, there are often no spaces between words and the reader's >knowledge of th

Re: still working with utf8

2007-06-22 Thread Dr.Ruud
Tom Allison schreef: > I have a string: > =?iso-2022-jp?B?Rlc6IBskQjxkJDckNSRHJE8kSiQvJEYzWiQ3JF8kPyQkGyhC?= > That is a MIME::Base64 encoded string of iso-2022-jp characters. > > After I decode_base64 them and decode($text,'iso-2022-jp',utf8') them > I can print out something that looks exactly l

Re: still working with utf8

2007-06-21 Thread Mumia W.
On 06/21/2007 09:42 PM, Tom Allison wrote: OK, I sorted out what the deal is with charsets, Encode, utf8 and other goodies. Now I have something I'm just not sure exactly how it is supposet to operate. I have a string: =?iso-2022-jp?B?Rlc6IBskQjxkJDckNSRHJE8kSiQvJEYzWiQ3JF8kPyQkGyhC?= That i

Re: still working with utf8

2007-06-21 Thread Tom Phoenix
On 6/21/07, Tom Allison <[EMAIL PROTECTED]> wrote: I guess my question is, for CJK languages, should I expect the notion of using a regex like \w+ to pick up entire strings of text instead of discrete words like latin based languages? Once you've enabled what the perlunicode manpage calls "Cha

still working with utf8

2007-06-21 Thread Tom Allison
OK, I sorted out what the deal is with charsets, Encode, utf8 and other goodies. Now I have something I'm just not sure exactly how it is supposet to operate. I have a string: =?iso-2022-jp?B?Rlc6IBskQjxkJDckNSRHJE8kSiQvJEYzWiQ3JF8kPyQkGyhC?= That is a MIME::Base64 encoded string of iso-202