Hi, Thanks for the answers on my last question. I have since then dug a bit further in the UTF-8-related error message I got, and after some reading have a few questions with regards to UTF-8 handling in perl:
(Please bear in mind that I am not an IT guy) 1a) My use statements are the following: use warnings; use strict; use utf8; use open ':encoding(utf8)'; Now if I understand it correctly, there's two ways of encoding UTF-8 in perl: One liberal (utf8) and one strict (UTF-8). For my purpose, I need correctly encoded UTF-8 files. However, I cannot be sure whether the files I start with are properly encoded in UTF-8. So is it possible to open a file using the liberal interpretation, and write to a new file using the strict interpretation? Are there any issues regarding this, like characters that might not be re-encoded properly? 1b) How can I check whether a file is properly encoded UTF-8? 2a) As I understand it, Windows has a somewhat limited ability to display certain UTF-8 characters, although some fonts can display more of them. The characters do exist in the file, even if Windows can't display them (besides showing a square). Is this correct? If not, does that impact perl's ability to handle Unicode? 2b) Do scripts themselves have to be encoded in UTF-8 to be able to process UTF-8-files? If not, when should you encode the scripts in UTF-8 and when not? Most of my scripts add text to UTF-8 encoded text files. I've noticed that this sometimes seems to change the encoding or give error messages when e.g. accented characters are involved. Am I right in assuming that only scripts that remove text or extract certain parts do not need to be encoded in UTF-8? 2c) Not really a perl question: Does anyone know of a monospaced font for Windows that handles most UTF-8 characters gracefully? I would like one for use in Notepad++ to make it easier to write scripts containing special characters not normally displayable in Windows. 3) Windows uses UTF-8 with BOM, Unix and Unix-likes UTF-8 without BOM. A particular script of mine prepends a piece of text to UTF-8 encoded text files created with MS Word on Windows (saved as .txt with UTF-8 encoding). Unfortunately, this appears to break the encoding, which changes from "UTF-8 with BOM" to "UTF-8 without BOM", probably because the text is inserted *before* the BOM at the start of the file. How do I prevent this? How can my script recognize the BOM at the start of the file? Thanks for reading. Regards, Thomas -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/