On Jul 4, 2:36 pm, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > > The parser is failing on this line: > > > <mrcb245-c>Heinrich Kèufner, Norbert Nedopil, Heinz Schèoch (Hrsg.).</ > > mrcb245-c> > > If it is literally this line, it's no surprise: there must not be a line > break between the slash and the closing element name. > > However, since you are getting the error in a different column, it's > indeed more likely that there is a problem with the encoding. > > Given that the Python UTF-8 codec refuses the data, most likely, the > data is *not* encoded in UTF-8 (but perhaps in Latin-1). If so, you > need to prefix the XML document with a proper XML declaration, such > as > > <?xml version="1.0" encoding="iso-8859-1"?> > > Alternatively, make sure that the file is really encoded in UTF-8. > > Regards, > Martin
There is no line break in the xml file. It was just a formatting issue on this forum. However, you were right about the encoding not being utf-8. The xml file is autogenerated by a different script so that's probably where it is going wrong. The parser works fine if I change the first line to <?xml version="1.0" encoding="iso-8859-1"?> Thank you very much -- http://mail.python.org/mailman/listinfo/python-list