[xml] less-than character and HTML parser module

2015-04-13 Thread Christian Schoenebeck
original read position if htmlParseHTMLName() failed. Currently it drops the entire supposed element. Relevant code section: HTMLparser.c -> htmlParseStartTag(). Best regards, Christian Schoenebeck ___ xml mailing list, project page http://xmlsoft.or

[xml] [PATCH] less-than character and HTML parser module

2015-04-14 Thread Christian Schoenebeck
On Tuesday 14 April 2015 09:31:25 Alex Bligh wrote: > On 13 Apr 2015, at 22:43, Christian Schoenebeck wrote: > > I just encountered an issue with stand-alone less-than characters if the > > document is parsed by libxml2's HTML parser module. Consider you have a > >

Re: [xml] [PATCH] less-than character and HTML parser module

2015-04-14 Thread Christian Schoenebeck
ecause other (weak) parsers allow it is not a good plan as > it causes divergence from the standard. There you go; you find the updated patch attached. It now requires HTML_PARSE_RECOVER option to be set for recovering from stand-alone less-than characters. Best regards, Christian Schoenebeck

Re: [xml] [PATCH] less-than character and HTML parser module

2015-04-16 Thread Christian Schoenebeck
tmlParseElement() providing a return value, however there is already one htmlParseElementInternal(), so adding yet another one would become nasty IMO. Best regards, Christian Schoenebeck ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml

Re: [xml] [PATCH] less-than character and HTML parser module

2015-04-25 Thread Christian Schoenebeck
On Thursday 16 April 2015 13:59:28 Christian Schoenebeck wrote: > On Thursday 16 April 2015 10:32:32 you wrote: > > > There you go; you find the updated patch attached. It now requires > > > HTML_PARSE_RECOVER option to be set for recovering from stand-alone >

Re: [xml] [PATCH] less-than character and HTML parser module

2015-04-27 Thread Christian Schoenebeck
On Sunday 26 April 2015 03:24:35 Christian Schoenebeck wrote: > The 2nd patch (libxml-invalid-tag-as-text.patch) uses that more general way > to resolve this overall issue. That is, instead of looking at the content > and trying to guess ahead whether a less than character will yield in