Re: regex & utf8

2007-05-12 Thread Dr.Ruud
Tom Allison schreef: > Ruud: >> Tom: >>> Under perl version 5.8, does /(\w+)/ match UTF-8 characters without >>> calling any special pragma? >> >> Yes, but only if your data is proper. Mind that any ASCII-character >> is a UTF-8 character too (U+ .. U+007F). > >>> So I'm trying to see if I ca

Re: regex & utf8

2007-05-12 Thread Tom Allison
Rather than going through the somewhat buggy process of trying to determine which of the many character sets there are, is there some way that I can just universally convert everything into UTF8? I can open a file with a :utf8 declaration when creating the file handle. But do I need to do

Re: regex & utf8

2007-05-12 Thread Dr.Ruud
Tom Allison schreef: > Under perl version 5.8, does /(\w+)/ match UTF-8 characters without > calling any special pragma? Yes, but only if your data is proper. Mind that any ASCII-character is a UTF-8 character too (U+ .. U+007F). > So I'm trying to see if I can just use /(\w+)/ without worr

Re: regex & utf8

2007-05-11 Thread Chas Owens
On 5/11/07, Tom Allison <[EMAIL PROTECTED]> wrote: snip So if I open a filehandle with a :utf8 layer then /(\w+)/ will match just fine. But /([EMAIL PROTECTED])/ is going to be rather ugly? Would /([\w])/ simply match on the first byte? snip Beats me, I haven't had the occasion to actually use

Re: regex & utf8

2007-05-11 Thread Tom Allison
Chas Owens wrote: On 5/11/07, Tom Allison <[EMAIL PROTECTED]> wrote: OK, I'm reading through different unicode related perldocs and have a rather simple question. Under perl version 5.8, does /(\w+)/ match UTF-8 characters without calling any special pragma? I'm having a hard time finding some

Re: regex & utf8

2007-05-11 Thread Chas Owens
On 5/11/07, Tom Allison <[EMAIL PROTECTED]> wrote: OK, I'm reading through different unicode related perldocs and have a rather simple question. Under perl version 5.8, does /(\w+)/ match UTF-8 characters without calling any special pragma? I'm having a hard time finding something that makes th