Fredrik Lundh schreef: > [EMAIL PROTECTED] wrote: > > > I think I ran into a bug in the XML SAX parser. > > > > part of my program consist of reading a rather large XML file (about > > 10Mb) containing a few thousand elements. > > I have the following problem. Sometimes that SAX parses misreads a > > line. > > Let me explain: the XML file contains a few thousand lines like this: > > " > > <TargetRef>WINOSSPI:Storage@@n91c90a.cmc.com</TargetRef> > > " > > where 'n91c90a.cmc.com' is the name of a system and thus changes per > > system. > > I a few cases, the SAX parser misreads the line. The parser sometimes > > plits characters the line in: > > "WINOSSPI:Storage@@n" and "91c90a.cmc.com". > > I put a 'print characters' line in the 'characters' method of the > > parser that is how I found out. > > It only happens for a few of the thousand lines but you can imagine > > that is very annoying. > > > > I checked for errors in the XML file but the file seems ok. > > > > Is this a bug or am I doing something wrong? > > it's not a bug; the parser is free to split up character runs (due to > buffering, > entities or character references, etc). it's up to you to merge character > runs > into strings. > > </F> Thanks for the feedback,
but how do I detect that the parser has split up the characters? I gues I need to detect it in order to reconstruct the complete string -- http://mail.python.org/mailman/listinfo/python-list