Re: XML SAX parser bug?

mitsura Thu, 19 Jan 2006 11:10:45 -0800

Fredrik Lundh schreef:

> [EMAIL PROTECTED] wrote:
>
> > I think I ran into a bug in the XML SAX parser.
> >
> > part of my program consist of reading a rather large XML file (about
> > 10Mb) containing a few thousand elements.
> > I have the following problem. Sometimes that SAX parses misreads a
> > line.
> > Let me explain: the XML file contains a few thousand lines like this:
> > "
> > <TargetRef>WINOSSPI:Storage@@n91c90a.cmc.com</TargetRef>
> > "
> > where 'n91c90a.cmc.com' is the name of a system and thus changes per
> > system.
> > I a few cases, the SAX parser misreads the line. The parser sometimes
> > plits characters the line in:
> > "WINOSSPI:Storage@@n" and "91c90a.cmc.com".
> > I put a 'print characters' line in the 'characters' method of the
> > parser that is how I found out.
> > It only happens for a few of the thousand lines but you can imagine
> > that is very annoying.
> >
> > I checked for errors in the XML file but the file seems ok.
> >
> > Is this a bug or am I doing something wrong?
>
> it's not a bug; the parser is free to split up character runs (due to 
> buffering,
> entities or character references, etc).  it's up to you to merge character 
> runs
> into strings.
>
> </F>
Thanks for the feedback,


but how do I detect that the parser has split up the characters? I gues
I need to detect it in order to reconstruct the complete string

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XML SAX parser bug?

Reply via email to