Tom Allison schreef: > Under perl version 5.8, does /(\w+)/ match UTF-8 characters without > calling any special pragma?
Yes, but only if your data is proper. Mind that any ASCII-character is a UTF-8 character too (U+0000 .. U+007F). > So I'm trying to see if I can just use /(\w+)/ without worrying about > all this character encoding? Only if your data is proper. A file is just a string of bytes. If you use the proper IO-layer while reading in the file, then you'll end up with proper data (a string of characters, not of bytes) to work with. A UTF-8 encoded file can't tell you that it is UTF-8 encoded. For example a UTF-8 BOM at the start (as Windows Notepad uses) is not proof. So you need to know beforehand. -- Affijn, Ruud "Gewoon is een tijger." -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/