Re: [xml] HTML parser sometimes doesn't close script tags in libxml2 2.9.8

2019-01-23 Thread Nick Wellnhofer
On 23/01/2019 01:47, Tomi Belan wrote: But even so I still wasn't able to reproduce it in pure C. Could it be because xmllint reads ctxt->myDoc, and lxml uses SAX2 event handlers (according to parsertarget.pxi)? AFAICT xmllint's --push and --sax options are incompatible. ctxt->myDoc is also b

Re: [xml] HTML parser sometimes doesn't close script tags in libxml2 2.9.8

2019-01-23 Thread Tomi Belan via xml
On Wed, Jan 23, 2019 at 12:55 PM Nick Wellnhofer wrote: > The commit obviously also affected documents that didn't need encoding > conversion. I didn't realize that. Aha! I noticed that the chromium link you sent mentions a >32KB string which gets converted to a >64KB string, which sounded susp

Re: [xml] HTML parser sometimes doesn't close script tags in libxml2 2.9.8

2019-01-23 Thread Nick Wellnhofer
On 23/01/2019 16:14, Tomi Belan wrote: I don't know too much about Python's C API, but [2] [3] suggests lxml is using a deprecated macro and giving libxml2 a multibyte buffer even though the input would fit into pure ASCII. This explains why it behaved differently than xmllint. Right, if Pyth