Channeling unicode text experts and xml people: I have xml entity with initial bytes ff fe ff fe which the file command says is UTF-16, little-endian text.
I agree, but what should be done about the additional BOM. A test output made many years ago seems to keep the extra BOM. The xml context is xml file 014.xml <!DOCTYPE doc [ <!ELEMENT doc (#PCDATA)> <!ENTITY e SYSTEM "014.ent"> ]> <doc>&e;</doc the entitity file 014.ent is bombomdata b'\xff\xfe\xff\xfed\x00a\x00t\x00a\x00' The old saved test output of processing is b'<doc>\xef\xbb\xbfdata</doc>' which implies seems as though the extra BOM in the entity has been kept and processed into a different BOM meaning utf8. I think the test file is wrong and that multiple BOM chars in the entiry should have been removed. Am I right? -- Robin Becker -- https://mail.python.org/mailman/listinfo/python-list