Re: \w regular expressions unicode

2009-04-24 Thread Jay Savage
On Fri, Apr 24, 2009 at 3:53 PM, Chas. Owens wrote: > 2009/4/24 Jay Savage : > snip >>> Hmm, I don't think it would reparse the whole file, but >>> it does run in a BEGIN block...hmm, I must test it. >>> >> >> It runs in a begin block, but it is still lexically scoped. Pragmata >> are very special

Re: \w regular expressions unicode

2009-04-24 Thread Chas. Owens
On Fri, Apr 24, 2009 at 15:53, Chas. Owens wrote: snip > All of this is good information, but for one thing: not all pragmas > are lexically scoped.  Hence the need to test and/or read the docs. > For instance, the re pragma[1] is only partially lexical: > > #!/usr/bin/perl > > use strict; > use w

Re: \w regular expressions unicode

2009-04-24 Thread Chas. Owens
2009/4/24 Jay Savage : snip >> Hmm, I don't think it would reparse the whole file, but >> it does run in a BEGIN block...hmm, I must test it. >> > > It runs in a begin block, but it is still lexically scoped. Pragmata > are very special cases of modules that provide modifications of > compile-time

Re: \w regular expressions unicode

2009-04-24 Thread Jay Savage
On Wed, Apr 22, 2009 at 6:12 PM, Chas. Owens wrote: > On Wed, Apr 22, 2009 at 17:54, Gunnar Hjalmarsson wrote: >> Chas. Owens wrote: >>> >>> On Wed, Apr 22, 2009 at 15:25, Gunnar Hjalmarsson >>> wrote: >>> snip >>> The utf8 pragma affects the whole file, >> >> Well, only the part of the file th

Re: \w regular expressions unicode

2009-04-22 Thread Chas. Owens
On Wed, Apr 22, 2009 at 18:12, Chas. Owens wrote: > On Wed, Apr 22, 2009 at 17:54, Gunnar Hjalmarsson wrote: >> Chas. Owens wrote: >>> >>> On Wed, Apr 22, 2009 at 15:25, Gunnar Hjalmarsson >>> wrote: >>> snip > > Yeah, it looks so. With "use utf8" (http://perldoc.perl.org/utf8.html)

Re: \w regular expressions unicode

2009-04-22 Thread Chas. Owens
On Wed, Apr 22, 2009 at 17:54, Gunnar Hjalmarsson wrote: > Chas. Owens wrote: >> >> On Wed, Apr 22, 2009 at 15:25, Gunnar Hjalmarsson >> wrote: >> snip Yeah, it looks so. With "use utf8" (http://perldoc.perl.org/utf8.html) one can however make them parsed (decoded) (provided t

Re: \w regular expressions unicode

2009-04-22 Thread Gunnar Hjalmarsson
Chas. Owens wrote: On Wed, Apr 22, 2009 at 15:25, Gunnar Hjalmarsson wrote: snip Yeah, it looks so. With "use utf8" (http://perldoc.perl.org/utf8.html) one can however make them parsed (decoded) (provided they are valid UTF-8). No. The utf8 pragma is about allowing UTF-8 encoded *symbols*, e.g

Re: \w regular expressions unicode

2009-04-22 Thread Gunnar Hjalmarsson
Stanisław T. Findeisen wrote: I mean this: #!/usr/bin/perl use warnings; use strict; # use utf8; use Encode; my $utf8_encoded = "smörgåsbord"; print('is_utf8: ' . (Encode::is_utf8($utf8_encoded) ? 'TRUE' : 'FALSE') . "\n"); This outputs "FALSE" here, but uncomment "use utf8" and it gets "TR

Re: \w regular expressions unicode

2009-04-22 Thread Chas. Owens
On Wed, Apr 22, 2009 at 15:25, Gunnar Hjalmarsson wrote: snip >> Yeah, it looks so. With "use utf8" (http://perldoc.perl.org/utf8.html) one >> can however make them parsed (decoded) (provided they are valid UTF-8). > > No. The utf8 pragma is about allowing UTF-8 encoded *symbols*, e.g. variable >

Re: \w regular expressions unicode

2009-04-22 Thread Stanisław T. Findeisen
Gunnar Hjalmarsson wrote: Or did you possibly mean the utf8::decode() function? I mean this: #!/usr/bin/perl use warnings; use strict; # use utf8; use Encode; my $utf8_encoded = "smörgåsbord"; print('is_utf8: ' . (Encode::is_utf8($utf8_encoded) ? 'TRUE' : 'FALSE') . "\n"); This outputs "F

Re: \w regular expressions unicode

2009-04-22 Thread Gunnar Hjalmarsson
Gunnar Hjalmarsson wrote: Stanisław T. Findeisen wrote: With "use utf8" (http://perldoc.perl.org/utf8.html) one can however make them parsed (decoded) (provided they are valid UTF-8). No. The utf8 pragma is about allowing UTF-8 encoded *symbols*, e.g. variable names or subroutine names. Or

Re: \w regular expressions unicode

2009-04-22 Thread Gunnar Hjalmarsson
Stanisław T. Findeisen wrote: Gunnar Hjalmarsson wrote: What assumptions does Perl make regarding input file (i.e., the program/script file) encoding? AFAIK, it just converts the bytes into Perl's internal format, but it does not assume anything (at least not by default) with respect to the

Re: \w regular expressions unicode

2009-04-22 Thread Stanisław T. Findeisen
Gunnar Hjalmarsson wrote: What assumptions does Perl make regarding input file (i.e., the program/script file) encoding? AFAIK, it just converts the bytes into Perl's internal format, but it does not assume anything (at least not by default) with respect to the character encoding. Is it so

Re: \w regular expressions unicode

2009-04-22 Thread Gunnar Hjalmarsson
Stanisław T. Findeisen wrote: Gunnar Hjalmarsson wrote: Stanisław T. Findeisen wrote: Hi how to write regular expressions matching against Unicode (eg., UTF-8) strings? For instance, in my regexp: qr/^([.<>@ \w])*$/ Decode the UTF-8 encoded strings before applying the regex on them. $ per

Re: \w regular expressions unicode

2009-04-22 Thread Stanisław T. Findeisen
Gunnar Hjalmarsson wrote: Stanisław T. Findeisen wrote: Hi how to write regular expressions matching against Unicode (eg., UTF-8) strings? For instance, in my regexp: qr/^([.<>@ \w])*$/ Decode the UTF-8 encoded strings before applying the regex on them. $ perl -MEncode -le ' $utf8_encoded

Re: \w regular expressions unicode

2009-04-18 Thread Gunnar Hjalmarsson
Stanisław T. Findeisen wrote: Hi how to write regular expressions matching against Unicode (eg., UTF-8) strings? For instance, in my regexp: qr/^([.<>@ \w])*$/ Decode the UTF-8 encoded strings before applying the regex on them. $ perl -MEncode -le ' $utf8_encoded = "smörgåsbord"; $s = decod