On Wed, Jul 08, 2015 at 12:55:31PM +0300, Yuriy Ustushenko wrote: > On 07/08/2015 07:10 AM, Daniel Veillard wrote: > >that looks like a very good start, would have been better if the parser > >context > >didn't need tweaking as well as xmlDtd. Also I'm not sure about the way to > >detect HTML5: > > > >+ if (name != NULL && !xmlStrcasecmp(name, BAD_CAST "HTML")) { > >+ if (ExternalID == NULL && ((SystemID == NULL) || > >+ !xmlStrcasecmp(SystemID, BAD_CAST "about:legacy-compat"))) { > >+ cur->html_schema = &html5Schema; > > > > seems a bit too inclusive, Looks like we would default to html5 each time > >there is an URI for the systemID, which a lot of HTML4 do. > > I agree with you, but I have no good idea how to do it.
I guess we need to infer based on the DOCTYPE, but it's rather ugly http://www.w3.org/TR/html5/syntax.html#the-doctype the problem is that <html> <body> .... </body> </html> ca be either and we can only detect when we hit a problem. I would be tempted to parse using html4, assuming we don't know (unless we see an HTML4 DOCTYPE SYSTEM or PUBLIC), if we found a problem using the html4 schemas and this is avoided by the html5 schemas then switch. I.e. a late detection assuming we receive html4, since it's mostly a subset of html5 that sounds the safer. That will require tweaking for sure ! > > Thanks for review. and thanks again for the patch :-) Daniel -- Daniel Veillard | Open Source and Standards, Red Hat veill...@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml