Hey,

I'm using libxml2-2.9.8.

When using libxml to parse xml I can use

ctxt->record_info = true
xmlInitNodeInfoSeq(&ctxt->node_seq);
xmlParseDocument(ctxt)

to record positions for the parsed nodes.

However, for HTML the following

ctxt->record_info = 1;
xmlInitNodeInfoSeq(&ctxt->node_seq);
htmlParseDocument(ctxt);

leads to seg fault for some (not necessarily well formed) HTML files. A
minimal example would be an HTML file with content "<label></label>" which
leads to a seg fault:

#0  0x0000555555695199 in xmlSAX2EndElement (ctx=0x555555975a20,
name=0x55555570141e "body") at external/libxml2/libxml2-2.9.8/SAX2.c:1815
#1  0x000055555561412b in htmlAutoCloseOnEnd (ctxt=0x555555975a20) at
external/libxml2/libxml2-2.9.8/HTMLparser.c:1384
#2  0x000055555561cae2 in htmlParseContentInternal (ctxt=0x555555975a20) at
external/libxml2/libxml2-2.9.8/HTMLparser.c:4674
#3  0x000055555561d0da in htmlParseDocument (ctxt=0x555555975a20) at
external/libxml2/libxml2-2.9.8/HTMLparser.c:4817
#4  0x000055555556f81d in ParseHTML (content="<label></label>\n",
nodes=0x7fffffffd7a0, error_message=0x7fffffffd8b0) at
parser/xml_parser.cpp:431
#5  0x00005555555711e6 in main (argc=2, argv=0x7fffffffdb08) at
parser/xml_parser.cpp:596

Does the API for parsing HTML files support recording positions of the
nodes? If so, what am I doing wrong or what can be done to prevent the seg
fault?

Thank you and best regards

Ben
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml

Reply via email to