removing BOM prepended by codecs?

2013-09-24 Thread J. Bagg
I'm having trouble with the BOM that is now prepended to codecs files. The files have to be read by java servlets which expect a clean file without any BOM. Is there a way to stop the BOM being written? It is seriously messing up my work as the servlets do not expect it to be there. I could d

removing BOM prepended by codecs?

2013-09-24 Thread J. Bagg
I'm using: outputfile = codecs.open (fn, 'w+', 'utf-8', errors='strict') to write as I know that the files are unicode compliant. I run the raw files that are delivered through a Python script to check the unicode and report problem characters which are then edited. The files use a whole vari

removing BOM prepended by codecs?

2013-09-24 Thread J. Bagg
I've checked the original files using od and they don't have BOMs. I'll remove them in the servlet. The overhead is probably small enough unless somebody is doing a massive search. We have a limit anyway to prevent somebody stealing the entire set of data. I started writing the Python search

removing BOM prepended by codecs?

2013-09-24 Thread J. Bagg
My editor is JEdit. I use it on a Win 7 machine but have everything set up for *nix files as that is the machine I'm normally working on. The files are mailed to me as updates. The library where the indexers work do use MS computers but this is restricted to EndNote with an exporter into the o

removing BOM prepended by codecs?

2013-09-25 Thread J. Bagg
So it is just a random sequence of "junk". It will be a matter of finding the real start of the record (in this case a %) and throwing the "junk" away. I was misled by the note in the codecs class that BOMs were being prepended. Should have looked more carefully. Mea culpa. -- https://mail.