On Tue, 20 Dec 2016 10:53 pm, Sayth Renshaw wrote: > content.read().encode('utf-8'), parser=utf8_parser) > > However doing it in such a fashion returns this error: > > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: > invalid start byte
That tells you that the XML file you have is not actually UTF-8. You have a file that begins with a byte 0xFF. That is invalid UTF-8. No valid UTF-8 string contains the byte 0xFF. https://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences So you need to consider: - Are you sure that the input file is intended to be UTF-8? How was it created? - Is the second byte 0xFE? If so, that suggests that you actually have UTF-16 with a byte-order mark. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list