Bruce Miller schrieb am 28.05.2015 um 18:37: > On 05/28/2015 12:29 PM, Noam Postavsky wrote: >> On Thu, May 28, 2015 at 12:13 PM, Frank Gross wrote: >>> Are there any plans to support parsing of HTML V5 in libxml ? I tried >>> function htmlCtxtReadMemory(), but it raises an error for HTML document >>> containing tags introduced in HTML V5 such as : Tag header invalid. > > I'd love to see this happen! I'm so used to the libxml2 tools, > and the tools built upon them, it would SO simplify my life. > >> I think the same question has already been asked, and answered at >> https://mail.gnome.org/archives/xml/2013-April/msg00006.html > > Sorta, yes. But HTML5 is essentially _defined_ by it's parser rather than > by it's spec. In particular the (annoying) way that it rewrites the DOM > to turn what you wrote into what it wants. That being the case, there's > more to adapting libxml's HTML parser than just being more forgiving about > the unrecognized tags --- the resulting DOM might not be quite what HTML5 > specifies!
I think most people would be happy if the new tags were recognised correctly, e.g. the self-closing ones. Whether or not the resulting DOM tree is strictly HTML5 parsing conform or not - does it really matter that much? > Which is all to say that it's not quite trivial; would probably amount to > importing the "official" parser and modifying it to create libxml's internal > structure. Sadly, Daniel doesn't have the time. Nor, alas, do I. There's a long list of tag metadata in the HTMLparser.c file. I'm sure a patch that adds just a couple of the new tags would be warmly appreciated. As long as everyone just goes "*I* don't have time ATM, not even to add one little tag", nothing's going to change here. Stefan _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml