I'm using: outputfile = codecs.open (fn, 'w+', 'utf-8', errors='strict')
to write as I know that the files are unicode compliant. I run the raw files that are delivered through a Python script to check the unicode and report problem characters which are then edited. The files use a whole variety of languages from Sanskrit to Cyrillic and more obscure ones too.
I'll probably have to remove it in the servlet as we have standardised on utf-8. This was done some years ago when utf-16 was rare (apart from Macs).
J -- https://mail.python.org/mailman/listinfo/python-list