Hi

I am trying to follow lxml from Python that allows to get the text after the end of an element, but before the next element begins (i.e. the next sibling of the current element). I am able to do this with xmlTextReader, by obtaining a pointer from the current node (when the node type is ELEMENT) to its next sibling. However, this approach does not work all the times:

<h1>Text before <strong>bold 1 <underline>undelined text</underline> after bold 1</strong>in between <strong>bold 2</strong>text after <strong>bold 3</strong>.</h1> <h1><strong>bold 1</strong> no text before <strong>bold 2</strong> text after <strong>bold 3</strong>.</h1>

The first <h1> element is correctly parsed, but the second one is not, the text node " no text before " is not detected as the tail of the element <strong>. lxml however works correctly, this is the way actually I am validating my XML parser. I am a little bit puzzled by this result since lxml is an API for libxml2, however I am not sure if lxml implementation uses just xmlTextReader parser or buids the entire DOM tree. Is there a way to get the tail of an element with xmlTextReader ?

thanks
Bogdan
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml

Reply via email to