Re: [xml] [PATCH] less-than character and HTML parser module

Christian Schoenebeck Thu, 16 Apr 2015 05:59:26 -0700

On Thursday 16 April 2015 10:32:32 you wrote:
> > There you go; you find the updated patch attached. It now requires
> > HTML_PARSE_RECOVER option to be set for recovering from stand-alone
> > less-than characters.
> 
> That sounds fine *except* it doesn't raise an error.
> The parser knows it's a broken construct that must be pointed out.


Ok, I see what I can do about that. ;)

>  It sounds a bit weird to handle that error case as one of the main content
> cases, I would still be tempted to go into htmlParseStartTag, get the
> error reported, but push corrective data instead in recover mode.

My initial thought solution was to enter htmlParseElement() like before, and 
in case htmlParseElement() encounters an error, it would handle the chunk as 
text instead (if recover option is on). That would probably come to the 
closest what most browsers seem to do. But the problem: that would require the 
public API function's prototype of

        void htmlParseElement(htmlParserCtxtPtr)

to be changed to

        int htmlParseElement(htmlParserCtxtPtr)

To avoid that API change, one could add another internal (static) version of 
htmlParseElement() providing a return value, however there is already one 
htmlParseElementInternal(), so adding yet another one would become nasty IMO.

Best regards,
Christian Schoenebeck
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml

Re: [xml] [PATCH] less-than character and HTML parser module

Reply via email to