street.swee...@mailworks.org wrote: > An opportunity to work in Python, and the necessity of working with some > XML too large to visualize, got me thinking about an answer Alan Gauld had > written to me a few years ago > (https://mail.python.org/pipermail/tutor/2015-June/105810.html). I have > applied that information in this script, but I have another question :) > > Let's say I have an xml file like this: > > -------------- order.xml ---------------- > > <salesorder> > <customername>Bob</customername> > <customerlocation>321 Main St</customerlocation> > <saleslines> > <salesline> > <item>D20</item> > <quantity>4</quantity> > </salesline> > <salesline> > <item>CS211</item> > <quantity>1</quantity> > </salesline> > <salesline> > <item>BL5</item> > <quantity>7</quantity> > </salesline> > <salesline> > <item>AC400</item> > <quantity>1</quantity> > </salesline> > </saleslines> > </salesorder> > > ---------- end order.xml ---------------- > > Items CS211 and AC400 are not valid items, and I want to remove their > <salesline> nodes. I came up with the following (python 3.6.7 on linux): > > ------------ xml_delete_test.py -------------------- > > import os > import xml.etree.ElementTree as ET > > hd = os.path.expanduser('~') > inputxml = os.path.join(hd,'order.xml') > outputxml = os.path.join(hd,'fixed_order.xml') > > valid_items = ['D20','BL5'] > > tree = ET.parse(inputxml) > root = tree.getroot() > saleslines = root.find('saleslines').findall('salesline') > for e in saleslines[:]: > if e.find('item').text not in valid_items: > saleslines.remove(e) > > tree.write(outputxml) > > ---------- end xml_delete_test.py ------------------ > > The above code runs without error, but simply writes the original file to > disk. The desired output would be: > > -------------- fixed_order.xml ---------------- > > <salesorder> > <customername>Bob</customername> > <customerlocation>321 Main St</customerlocation> > <saleslines> > <salesline> > <item>D20</item> > <quantity>4</quantity> > </salesline> > <salesline> > <item>BL5</item> > <quantity>7</quantity> > </salesline> > </saleslines> > </salesorder> > > ---------- end fixed_order.xml ---------------- > > What I find particularly confusing about the problem is that after running > xml_delete_test.py in the Idle editor, if I go over to the shell and type > saleslines, I can see that it's now a list of two elements. I run the > following: > > for i in saleslines: > print(i.find('item').text) > > and I see that it's D20 and BL5, my two valid items. Yet when I write > tree out to the disk, it has the original four. Do I need to refresh tree > somehow? > > Thanks!
First of all, thank you for this clear and complete problem description! > saleslines = root.find('saleslines').findall('salesline') Here findall() returns a new list of matches which is completely independent of the element tree. Therefore > saleslines.remove(e) will remove the element e from this indepent list, and only from that. To remove an element from the tree you have to know its parent, and then parent_element.remove(child_element) will actually modify the tree. In your case the parent is always <saleslines>, so you can restrict yourself to its children: saleslines = root.find('saleslines') for e in saleslines.findall('salesline'): if e.find('item').text not in valid_items: saleslines.remove(e) _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor