On Wed, Sep 12, 2012 at 05:12:43PM -0400, Daniel Richard G. wrote: > On Wed, 12 Sep 2012, Daniel Veillard wrote: > > > I could try to put Ubuntu on a VM too and see what is going on. > >Did you manage to isolate what specific test is failing, doing the > >same through xmllint command line test might be easier to debug, > > I did some more digging on this, this time using GCC's > -finstrument-functions in conjunction with Michal Ludvig's > handy-dandy CygProfiler suite > (http://www.logix.cz/michal/devel/CygProfiler/), and have obtained > some interesting results. But first, a question... > > The value of INPUT_CHUNK in include/libxml/parserInternals.h: Is it > valid/legal to crank this value up? Like, say, from 250 to 250000?
no :-) You would require the parsers to always have 250KB of readahead data in the buffer (ahead of the current parsing point). this is not the I/O block read value (which is MINLEN 4000 in xmlIO.c). It would also lead the parser to not shrink the read buffer on a regular basis. Too much read-ahead does not help, just the opposite I'm afraid. And there are some > That change causes the (unmodified) runtest program to do this on FC17: > > $ ./runtest > ## XML regression tests > ## XML regression tests on memory > ## XML entity subst regression tests > ## XML Namespaces regression tests > ## Error cases regression tests > Error for ./test/errors/attr1.xml failed > File ./test/errors/attr1.xml generated an error > Error for ./test/errors/attr2.xml failed > File ./test/errors/attr2.xml generated an error > Error for ./test/errors/name2.xml failed > File ./test/errors/name2.xml generated an error i would assume this change the output of the error messages, why and how, i don't know. > ## Error cases stream regression tests > ## Reader regression tests > ## Reader entities substitution regression tests > ## Reader on memory regression tests > ## Walker regression tests > ## SAX1 callbacks regression tests > Got a difference for ./test/rdf2 > File ./test/rdf2 generated an error > ## SAX2 callbacks regression tests > Got a difference for ./test/rdf2 > File ./test/rdf2 generated an error Could be due to 2 consecutive character() callback not split at the same level due to the change in buffering. > ## XML push regression tests [...] > > (All the other "make check" tests pass.) > > I was finally able to do a proper execution-trace diff with the > CygProfiler output, which showed that the good versus bad runs > diverged in xmlParseStartTag2(). Further GDB and printf() action > seemed to point to line 9213: > > if (ctxt->input->base != base) goto base_changed; > > So the issue, as far as I can tell, appears to be realloc() > shenanigans (or something a lot like it). Hum, I can try to explain what thise does there: we are parsing a start tag and we ....<name attr1="...> cur counts the number of characters from the beginning of the input buffer until the 'n', base is a pointer to the beginning of the input buffer. We want all the start tag to be in the input buffer to provide a SAX callback without copying strings out, only pointers to the buffer. So if while parsing name we notice that the bufer had to be expanded (and we have good tests to check that) we may need to restart the parsing phase of that start tag from scratch. That one of the most tricky part of the 'new' parser :-) Now somehow you hit a problem there, it might be useful to understand what the parser does at that point, does it fail parsing (if yes which error) does it succeed parsing but with incorrect data ? Interesting, the only scenario which could break there would be if xmlParseQName() where shrinking the buffer making it impossible to get back to the start of the name, and most likely leading to a parsing failure. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ dan...@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml