I'm having trouble with the BOM that is now prepended to codecs files.
The files have to be read by java servlets which expect a clean file
without any BOM.
Is there a way to stop the BOM being written?
It is seriously messing up my work as the servlets do not expect it to
be there. I could d
I'm using:
outputfile = codecs.open (fn, 'w+', 'utf-8', errors='strict')
to write as I know that the files are unicode compliant. I run the raw
files that are delivered through a Python script to check the unicode
and report problem characters which are then edited. The files use a
whole vari
I've checked the original files using od and they don't have BOMs.
I'll remove them in the servlet. The overhead is probably small enough
unless somebody is doing a massive search. We have a limit anyway to
prevent somebody stealing the entire set of data.
I started writing the Python search
My editor is JEdit. I use it on a Win 7 machine but have everything set
up for *nix files as that is the machine I'm normally working on.
The files are mailed to me as updates. The library where the indexers
work do use MS computers but this is restricted to EndNote with an
exporter into the o
So it is just a random sequence of "junk".
It will be a matter of finding the real start of the record (in this
case a %) and throwing the "junk" away. I was misled by the note in the
codecs class that BOMs were being prepended. Should have looked more
carefully.
Mea culpa.
--
https://mail.