I also built lxml 4.2.5 with pristine libxml2 2.9.8 (using a variation of the above command), and got the same results. So I don't think it's a distro specific problem.
I tried to reproduce it with only xmllint as you suggest, but I'm not having much luck. It produces correct results with "--html --debug bad.html", "--html --debug --stream bad.html", "--html --debug --push bad.html", and "--html --debug --sax bad.html". Maybe I'm just not using the right flags - I don't know if lxml uses SAX mode, or streaming, etc. But at this point I wouldn't be too surprised if it depended on the size of some internal input buffer that's different in lxml vs xmllint. I'd welcome any advice about what else I should try, or how can I find out what calls are being made from lxml to libxml2. Other than that: It's not ideal, but could you please check if you can also reproduce the bug with the first set of commands I posted? Just to verify it's not just me. Tomi On Tue, Jan 22, 2019 at 5:11 PM Nick Wellnhofer <wellnho...@aevum.de> wrote: > On 22/01/2019 15:43, Tomi Belan via xml wrote: > > After a lot of debugging, I determined the problem is in libxml2 and not > the > > other libraries in my stack, and that it only seems to happen on version > > 2.9.8. But I don't see any related changes in news.html for 2.9.9, nor > in the > > diff between them, so I am still worried: I don't know if the bug is > really > > fixed, or just dormant. I hope you can find the root cause, and maybe > add a > > regression test if you do. > > I also don't see any directly related changes in either 2.9.8 or 2.9.9. > > > This will download > > the manylinux binary build of lxml 4.2.5, which is statically linked to > > libxml2 2.9.8. > > Are you sure that a pristine 2.9.8 build was used? Maybe there are > additional > patches added by a distro? > > > I couldn't shorten the file very much, because if I delete even a single > > character, the bug stops triggering. (Could it be some buffer boundary > issue?) > > Yes, a buffer boundary issue seems likely. > > > I also built my own lxml 4.2.5 with libxml2 2.9.9 and it was not > affected. So > > I believe this is a bug in libxml2 2.9.8 specifically, and not in a > particular > > version of lxml. > > Did you also try your own build with the official libxml2 2.9.8 sources? > > > I hope you can solve the mystery. Please let me know if I can be of any > help. > > It would help if you could reproduce the issue with xmllint and no Python > code > involved. git-bisect might also be useful. > > Nick >
_______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml