Re: Getting elements and text with lxml

John Machin Sat, 17 May 2008 02:16:37 -0700

J. Pablo Fernández wrote:

On May 17, 2:19 am, "Gabriel Genellina" <[EMAIL PROTECTED]>
wrote:

En Fri, 16 May 2008 18:53:03 -0300, J. Pablo Fernández <[EMAIL PROTECTED]>escribió:

Hello,
I have an XML file that starts with:
<vortaro>
<art mrk="$Id: a.xml,v 1.10 2007/09/11 16:30:20 revo Exp $">
<kap>
  <ofc>*</ofc>-<rad>a</rad>
</kap>
out of it, I'd like to extract something like (I'm just showing one
structure, any structure as long as all data is there is fine):
[("ofc", "*"), "-", ("rad", "a")]
How can I do it? I managed to get the content of boths tags and the
text up to the first tag ("\n   "), but not the - (and in other XML
files, there's more text outside the elements).

Look for the "tail" attribute.


That gives me the last part, but not the one in the middle:

In : etree.tounicode(e)
Out: u'<kap>\n  <ofc>*</ofc>-<rad>a</rad>\n</kap>\n'

In : e.text
Out: '\n  '

In : e.tail
Out: '\n'

You need the text content of your initial element's children, whichneeds that of their children, and so on.


See http://effbot.org/zone/element-bits-and-pieces.htm

HTH,
John


--
http://mail.python.org/mailman/listinfo/python-list

Re: Getting elements and text with lxml

Reply via email to