On Tue, Apr 14, 2015 at 04:50:51PM +0100, Chris Tapp wrote: > > > On 14 Apr 2015, at 15:24, Christian Schoenebeck <schoeneb...@crudebyte.com> > > wrote: > > > > On Tuesday 14 April 2015 09:31:25 Alex Bligh wrote: > >> On 13 Apr 2015, at 22:43, Christian Schoenebeck <schoeneb...@crudebyte.com> > > wrote: > >>> I just encountered an issue with stand-alone less-than characters if the > >>> document is parsed by libxml2's HTML parser module. Consider you have a > >>> text > >>> > >>> in your HTML document like: > >>> a < b > >>> > >>> The less-than sign in this case is interpreted by the HTML parser module > >>> as tag start, causing subsequent text (in this case "< b") to be > >>> dropped. > >> > >> Isn't that correct? Shouldn't your document have > >> > >> a < b > > > > If it was a well-formed HTML document, then yes. But as said, in reality > > there > > are a load of HTML documents which contain text with raw less-than > > characters, > > supported by the fact that all major HTML browsers can handle it. libxml's > > HTML parser is yet an exception here. > > > > Attached you find a patch, suggesting a fix for this issue. > > If anything like this does get put in, it should only be if it is a > configurable option that is disabled by default - an xml parser should > only accept a strictly-conforming document by default. Adding support > for ‘broken’ html because other (weak) parsers allow it is not a > good plan as it causes divergence from the standard.
it's not the XML parser which is modified, it's the HTML 'lax' one The problem is that there is already way too many parser options IMHO. Daniel > Chris Tapp > opensou...@keylevel.com > www.keylevel.com > > ---- > You can tell you're getting older when your car insurance gets real cheap! > > _______________________________________________ > xml mailing list, project page http://xmlsoft.org/ > xml@gnome.org > https://mail.gnome.org/mailman/listinfo/xml -- Daniel Veillard | Open Source and Standards, Red Hat veill...@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml