On Wed, Sep 12, 2012 at 05:12:43PM -0400, Daniel Richard G. wrote:
> On Wed, 12 Sep 2012, Daniel Veillard wrote:
> 
> > I could try to put Ubuntu on a VM too and see what is going on.
> >Did you manage to isolate what specific test is failing, doing the
> >same through xmllint command line test might be easier to debug,
> 
> I did some more digging on this, this time using GCC's
> -finstrument-functions in conjunction with Michal Ludvig's
> handy-dandy CygProfiler suite
> (http://www.logix.cz/michal/devel/CygProfiler/), and have obtained
> some interesting results. But first, a question...
> 
> The value of INPUT_CHUNK in include/libxml/parserInternals.h: Is it
> valid/legal to crank this value up? Like, say, from 250 to 250000?

  no :-) You would require the parsers to always have 250KB of readahead
data in the buffer (ahead of the current parsing point). this is not the
I/O block read value (which is MINLEN 4000 in xmlIO.c). It would also
lead the parser to not shrink the read buffer on a regular basis.
  Too much read-ahead does not help, just the opposite I'm afraid.
And there are some

> That change causes the (unmodified) runtest program to do this on FC17:
> 
>       $ ./runtest
>       ## XML regression tests
>       ## XML regression tests on memory
>       ## XML entity subst regression tests
>       ## XML Namespaces regression tests
>       ## Error cases regression tests
>       Error for ./test/errors/attr1.xml failed
>       File ./test/errors/attr1.xml generated an error
>       Error for ./test/errors/attr2.xml failed
>       File ./test/errors/attr2.xml generated an error
>       Error for ./test/errors/name2.xml failed
>       File ./test/errors/name2.xml generated an error

  i would assume this change the output of the error messages, why
and how, i don't know.
>       ## Error cases stream regression tests
>       ## Reader regression tests
>       ## Reader entities substitution regression tests
>       ## Reader on memory regression tests
>       ## Walker regression tests
>       ## SAX1 callbacks regression tests
>       Got a difference for ./test/rdf2
>       File ./test/rdf2 generated an error
>       ## SAX2 callbacks regression tests
>       Got a difference for ./test/rdf2
>       File ./test/rdf2 generated an error

  Could be due to 2 consecutive character() callback not split at
the same level due to the change in buffering.

>       ## XML push regression tests
[...]
> 
> (All the other "make check" tests pass.)
> 
> I was finally able to do a proper execution-trace diff with the
> CygProfiler output, which showed that the good versus bad runs
> diverged in xmlParseStartTag2(). Further GDB and printf() action
> seemed to point to line 9213:
> 
>       if (ctxt->input->base != base) goto base_changed;
> 
> So the issue, as far as I can tell, appears to be realloc()
> shenanigans (or something a lot like it).

  Hum, I can try to explain what thise does there: we are parsing a
  start tag and we 

  ....<name attr1="...>

cur counts the number of characters from the beginning of the input
buffer until the 'n', base is a pointer to the beginning of the input
buffer. We want all the start tag to be in the input buffer to provide
a SAX callback without copying strings out, only pointers to the buffer.
So if while parsing name we notice that the bufer had to be expanded
(and we have good tests to check that) we may need to restart the
parsing phase of that start tag from scratch. That one of the most
tricky part of the 'new' parser :-)

  Now somehow you hit a problem there, it might be useful to understand
what the parser does at that point, does it fail parsing (if yes which
error) does it succeed parsing but with incorrect data ?

 Interesting, the only scenario which could break there would be if
xmlParseQName() where shrinking the buffer making it impossible to get
back to the start of the name, and most likely leading to a parsing
failure.

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
dan...@veillard.com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml

Reply via email to