Bruce Miller schrieb am 28.05.2015 um 18:37:
> On 05/28/2015 12:29 PM, Noam Postavsky wrote:
>> On Thu, May 28, 2015 at 12:13 PM, Frank Gross wrote:
>>>   Are there any plans to support parsing of HTML V5 in libxml ? I tried
>>> function htmlCtxtReadMemory(), but it raises an error for HTML document
>>> containing tags introduced in HTML V5 such as : Tag header invalid.
> 
> I'd love to see this happen!  I'm so used to the libxml2 tools,
> and the tools built upon them, it would SO simplify my life.
> 
>> I think the same question has already been asked, and answered at
>> https://mail.gnome.org/archives/xml/2013-April/msg00006.html
> 
> Sorta, yes. But HTML5 is essentially _defined_ by it's parser rather than
> by it's spec. In particular the (annoying) way that it rewrites the DOM
> to turn what you wrote into what it wants.  That being the case, there's
> more to adapting libxml's HTML parser than just being more forgiving about
> the unrecognized tags --- the resulting DOM might not be quite what HTML5
> specifies!

I think most people would be happy if the new tags were recognised
correctly, e.g. the self-closing ones. Whether or not the resulting DOM
tree is strictly HTML5 parsing conform or not - does it really matter that
much?


> Which is all to say that it's not quite trivial; would probably amount to
> importing the "official" parser and modifying it to create libxml's internal
> structure.  Sadly, Daniel doesn't have the time.   Nor, alas, do I.

There's a long list of tag metadata in the HTMLparser.c file. I'm sure a
patch that adds just a couple of the new tags would be warmly appreciated.
As long as everyone just goes "*I* don't have time ATM, not even to add one
little tag", nothing's going to change here.

Stefan

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml

Reply via email to