I have looked into the libxml code and I found the method htmlParseScript() within HTMLParser.c.
https://gitlab.gnome.org/GNOME/libxml2/blob/master/HTMLparser.c It describes the problem with the "<" character within scripts. But it offers the possibility to use the recover mode to ignore the tags. I have used xmllint --html -htmlout --recover mypage.html and it returns the last </td> tag. The PHP equivalent does not work (there is a flag "recover" on class DOMDocument, but the output is always the same). So I will look into the DOMDocument code (if it is available). ~André On 18.08.2018 00:33, Eric S Eberhard wrote: > I could be way off base -- don't you have to encode the portions in the js? > Otherwise I can see it being confused. The js looks like data and it can't > have < or > in it. > > https://stackoverflow.com/questions/1398571/html-inside-xml-should-i-use-cdata-or-encode-the-html > > Eric > > > Eric S Eberhard > VICS (Vertical Integrated Computer Systems) > Voice: 928 567 3529 > Cell : 928 301 7537 (not reliable except for text or if not home) > 2933 W Middle Verde Rd > Camp Verde, AZ 86322 > > > -----Original Message----- > From: xml [mailto:xml-boun...@gnome.org] On Behalf Of André Rothe > Sent: Friday, August 17, 2018 5:43 AM > To: xml@gnome.org > Subject: [xml] Error on parsing HTML with libxml > > Hi, > > I run into an HTML parser problem during PHP development. There is a class > DOMDocument, which uses libxml2 to parse HTML and XML documents. I found out, > that there is a problem with HTML documents, which have inline Javascript > code, which uses HTML tags within Javascript String variables. > > There is a little code example, which shows the problem: > > https://3v4l.org/O0iEf > > As you can see there, the last tag <td> is lost within the output. > Exactly the same error I will get with xmllint: > > xmllint --html --htmlout /tmp/page.html > > where page.html contains the HTML part of the example code above. The output > is > > page.html:11: HTML parser error : Unexpected end tag : td > printwin.document.writeln('</td>'); > > and within the output, the String will be empty: > > printwin.document.writeln(''); > > So I think, that the PHP error comes from the error within libxml2. I use > libxml2 version 2.9.1. > > Is it possible to fix that or is it already fixed within a newer version? > > Best regards > André > > _______________________________________________ > xml mailing list, project page http://xmlsoft.org/ xml@gnome.org > https://mail.gnome.org/mailman/listinfo/xml > > _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml