I'm having some trouble inserting elements where I want them
using the lxml ElementTree (Python 2.6).  I presume I'm making
some wrong assumptions about how lxml works and I'm hoping
someone can clue me in.

I want to process an xml document as follows:

For every occurrence of a particular element, no matter where it
appears in the tree, I want to add a sibling to that element with
the same name and a different value.

Here's the smallest artificial example I've found so far
demonstrates the problem:

    <foo>
      <whatever>
       <something/>
      </whatever>
      <bingo>Add another bingo after this</bingo>
      <bar/>
    </foo>

What I'd like to produce is this:

    <foo>
      <whatever>
       <something/>
      </whatever>
      <bingo>Add another bingo after this</bingo>
      <bar/>
    </foo>

Here's my program:

-------- cut here -----
from lxml import etree as etree

xml = """<?xml version="1.0" ?>
<foo>
  <whatever>
   <something/>
  </whatever>
  <bingo>Add another bingo after this</bingo>
  <bar/>
</foo>
"""

tree = etree.fromstring(xml)

# A list of all "bingo" element objects in the unmodified original xml
# There's only one in this example
elems = tree.xpath("//bingo")

# For each one, insert a sibling after it
bingoCounter = 0
for elem in elems:
    parent = elem.getparent()
    subIter = parent.iter()
    pos = 0
    for subElem in subIter:
        # Is it one we want to create a sibling for?
        if subElem == elem:
            newElem = etree.Element("bingo")
            bingoCounter += 1
            newElem.text = "New bingo %d" % bingoCounter
            newElem.tail = "\n"
            parent.insert(pos, newElem)
            break
        pos += 1

newXml = etree.tostring(tree)
print("")
print(newXml)
-------- cut here -----

The output follows:

-------- output -----
<foo>
  <whatever>
   <something/>
  </whatever>
  <bingo>Add another bingo after this</bingo>
  <bar/>
<bingo>New bingo 1</bingo>
</foo>
-------- output -----

Setting aside the whitespace issues, the bug in the program shows
up in the positioning of the insertion.  I wanted and expected it
to appear immediately after the original "bingo" element,
and before the "bar" element, but it appeared after the "bar"
instead of before it.

Everything works if I take the "something" element out of the
original input document.  The new "bingo" appears before the
"bar".  But when I put it back in, the inserted bingo is out of
order.  Why should that be?  What am I misunderstanding?

Is there a more intelligent way to do what I'm trying to do?

Thanks.

    Alan
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to