On Wed, Sep 12, 2012 at 10:13:23PM -0400, Daniel Richard G. wrote:
> On Wed, 12 Sep 2012, Daniel Veillard wrote:
> 
> >>The value of INPUT_CHUNK in include/libxml/parserInternals.h: Is
> >>it valid/legal to crank this value up? Like, say, from 250 to
> >>250000?
> >
> > no :-) You would require the parsers to always have 250KB of
> >readahead data in the buffer (ahead of the current parsing point).
> >this is not the I/O block read value (which is MINLEN 4000 in
> >xmlIO.c). It would also lead the parser to not shrink the read
> >buffer on a regular basis.
> > Too much read-ahead does not help, just the opposite I'm afraid.
> 
> Right; the point was not to make this change officially, but to
> change the buffer-growing behavior in a way that teases out the bug.
> I saw INPUT_CHUNK in the code, and figured frobbing that would do
> something.
> 
> What I was asking is, should everything still work correctly with
> that larger value? Not break anything, still give the same results?
> (Because if not, I'll need to find some other way of making the bug
> reproducible on FC17.)

  It may change some of the output as you reported, but not the document
content, so if that's the way you reproduced it on F17 okay

> >>So the issue, as far as I can tell, appears to be realloc()
> >>shenanigans (or something a lot like it).
> >
> > Hum, I can try to explain what thise does there: we are parsing a
> > start tag and we
> >
> > ....<name attr1="...>
> >
> >cur counts the number of characters from the beginning of the
> >input buffer until the 'n', base is a pointer to the beginning of
> >the input buffer. We want all the start tag to be in the input
> >buffer to provide a SAX callback without copying strings out, only
> >pointers to the buffer. So if while parsing name we notice that
> >the bufer had to be expanded (and we have good tests to check
> >that) we may need to restart the parsing phase of that start tag
> >from scratch. That one of the most tricky part of the 'new' parser
> >:-)
> 
> And if the buffer expands, the address might change, hence the
> pointer check...
> 
> > Now somehow you hit a problem there, it might be useful to understand
> >what the parser does at that point, does it fail parsing (if yes which
> >error) does it succeed parsing but with incorrect data ?
> 
> Well, you know what's going on much better than I do ^_^  Can you
> reproduce this?

  No, I don't have a precise idea of what you actually changed nor which file
exposes the problem nor how you reproduce it except by running the
modified runtest.c , does that show up in xmllint ?

> >Interesting, the only scenario which could break there would be if
> >xmlParseQName() where shrinking the buffer making it impossible to
> >get back to the start of the name, and most likely leading to a
> >parsing failure.
> 
> The printf()s I alluded to earlier went into that conditional,
> showing which branch was followed: "base has changed" or "base is
> equal". With the cut-down runtest, when it succeeds, there are eight
> "equal" lines and one "changed". When it fails, there are nine
> "equal"s. So it seems like a *lack* of growing is what leads to the
> bug....

  And the big buffer might be one of the way to force that behaviour.
Can your git diff your setup and tell me how you reproduce ?

 thanks,

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
dan...@veillard.com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml

Reply via email to