[xml] Get element tail

Bogdan Cristea Wed, 23 Oct 2013 03:13:01 -0700

Hi

I am trying to follow lxml from Python that allows to get the text afterthe end of an element, but before the next element begins (i.e. the nextsibling of the current element). I am able to do this withxmlTextReader, by obtaining a pointer from the current node (when thenode type is ELEMENT) to its next sibling. However, this approach doesnot work all the times:

<h1>Text before <strong>bold 1 <underline>undelined text</underline>after bold 1</strong>in between <strong>bold 2</strong>text after<strong>bold 3</strong>.</h1><h1><strong>bold 1</strong> no text before <strong>bold 2</strong> textafter <strong>bold 3</strong>.</h1>

The first <h1> element is correctly parsed, but the second one is not,the text node " no text before " is not detected as the tail of theelement <strong>. lxml however works correctly, this is the way actuallyI am validating my XML parser. I am a little bit puzzled by this resultsince lxml is an API for libxml2, however I am not sure if lxmlimplementation uses just xmlTextReader parser or buids the entire DOMtree. Is there a way to get the tail of an element with xmlTextReader ?


thanks
Bogdan
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
https://mail.gnome.org/mailman/listinfo/xml

[xml] Get element tail

Reply via email to