On Tue, 24 Sep 2013 10:42:22 +0100, J. Bagg wrote: > I'm having trouble with the BOM that is now prepended to codecs files. > The files have to be read by java servlets which expect a clean file > without any BOM. > > Is there a way to stop the BOM being written?
Of course there is :-) but first we need to know how you are writing it in the first place. If you are dealing with existing files, which already contain a BOM, you may need to open the files and re-save them without the BOM. If you are dealing with temporary files you're creating programmatically, it depends how you're creating them. My guess is that you're doing something like this: f = open("some file", "w", encoding="UTF-16") # or UTF-32 f.write(data) f.close() or similar. Both the UTF-16 and UTF-32 codecs write BOMs. To avoid that, you should use UTF-16-BE or UTF-16-LE (Big Endian or Little Endian), as appropriate to your platform. If you're getting a UTF-8 BOM, that's seriously weird. The standard UTF-8 codec doesn't write a BOM. (Strictly speaking, it's not a Byte Order Mark, but a Signature.) Unless you're using encoding='UTF-8-sig', I can't guess how you're getting a UTF-8 BOM. If you're doing something else, well, you'll have to explain what you're doing before we can tell you how to stop doing it :-) > I'm working on Linux with a locale of en_GB.UTF8 The locale only sets the default encoding used by the OS, not that used by Python. Python 2 defaults to ASCII; Python 3 defaults to UTF-8. -- Steven -- https://mail.python.org/mailman/listinfo/python-list