Hello, authors of libxml2.
I'm using libxml2 to parse HTML and it sometimes produces the wrong result.
In some weird circumstances, when the parser sees "" it won't
close the script tag, but instead it will literally add "" to the
text node and continue parsing the rest of the input verbatim as if
On 22/01/2019 15:43, Tomi Belan via xml wrote:
After a lot of debugging, I determined the problem is in libxml2 and not the
other libraries in my stack, and that it only seems to happen on version
2.9.8. But I don't see any related changes in news.html for 2.9.9, nor in the
diff between them, s
I also built lxml 4.2.5 with pristine libxml2 2.9.8 (using a variation of
the above command), and got the same results. So I don't think it's a
distro specific problem.
I tried to reproduce it with only xmllint as you suggest, but I'm not
having much luck. It produces correct results with "--html
On 22/01/2019 19:11, Tomi Belan wrote:
I tried to reproduce it with only xmllint as you suggest, but I'm not having
much luck. It produces correct results with "--html --debug bad.html", "--html
--debug --stream bad.html", "--html --debug --push bad.html", and "--html
--debug --sax bad.html".
Thanks, that's very useful!
With a dynamically linked build of lxml, I used "ltrace" to see the calls
to libxml2. Looks like you're correct there is only one call to
htmlParseChunk with the whole content (followed by a zero-length call to
terminate the input). But even so I still wasn't able to re