On Wed, Sep 12, 2012 at 10:13:23PM -0400, Daniel Richard G. wrote: > On Wed, 12 Sep 2012, Daniel Veillard wrote: > > >>The value of INPUT_CHUNK in include/libxml/parserInternals.h: Is > >>it valid/legal to crank this value up? Like, say, from 250 to > >>250000? > > > > no :-) You would require the parsers to always have 250KB of > >readahead data in the buffer (ahead of the current parsing point). > >this is not the I/O block read value (which is MINLEN 4000 in > >xmlIO.c). It would also lead the parser to not shrink the read > >buffer on a regular basis. > > Too much read-ahead does not help, just the opposite I'm afraid. > > Right; the point was not to make this change officially, but to > change the buffer-growing behavior in a way that teases out the bug. > I saw INPUT_CHUNK in the code, and figured frobbing that would do > something. > > What I was asking is, should everything still work correctly with > that larger value? Not break anything, still give the same results? > (Because if not, I'll need to find some other way of making the bug > reproducible on FC17.)
It may change some of the output as you reported, but not the document content, so if that's the way you reproduced it on F17 okay > >>So the issue, as far as I can tell, appears to be realloc() > >>shenanigans (or something a lot like it). > > > > Hum, I can try to explain what thise does there: we are parsing a > > start tag and we > > > > ....<name attr1="...> > > > >cur counts the number of characters from the beginning of the > >input buffer until the 'n', base is a pointer to the beginning of > >the input buffer. We want all the start tag to be in the input > >buffer to provide a SAX callback without copying strings out, only > >pointers to the buffer. So if while parsing name we notice that > >the bufer had to be expanded (and we have good tests to check > >that) we may need to restart the parsing phase of that start tag > >from scratch. That one of the most tricky part of the 'new' parser > >:-) > > And if the buffer expands, the address might change, hence the > pointer check... > > > Now somehow you hit a problem there, it might be useful to understand > >what the parser does at that point, does it fail parsing (if yes which > >error) does it succeed parsing but with incorrect data ? > > Well, you know what's going on much better than I do ^_^ Can you > reproduce this? No, I don't have a precise idea of what you actually changed nor which file exposes the problem nor how you reproduce it except by running the modified runtest.c , does that show up in xmllint ? > >Interesting, the only scenario which could break there would be if > >xmlParseQName() where shrinking the buffer making it impossible to > >get back to the start of the name, and most likely leading to a > >parsing failure. > > The printf()s I alluded to earlier went into that conditional, > showing which branch was followed: "base has changed" or "base is > equal". With the cut-down runtest, when it succeeds, there are eight > "equal" lines and one "changed". When it fails, there are nine > "equal"s. So it seems like a *lack* of growing is what leads to the > bug.... And the big buffer might be one of the way to force that behaviour. Can your git diff your setup and tell me how you reproduce ? thanks, Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ dan...@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml