Steven Bethard <[EMAIL PROTECTED]> writes: >> I vote for example with ElementTree (without xpath) >> with a mention of using ElementSoup for invalid HTML. > > Sounds good to me. Maybe something like:: > > import xml.etree.ElementTree as etree > dinner_recipe = ''' > <ingredients> > <ing><amt><qty>24</qty><unit>slices</unit></amt><item>baguette</item></ing> > <ing><amt><qty>2+</qty><unit>tbsp</unit></amt><item>olive_oil</item></ing> ^^^^^^^^^
Is that a typo here? > <ing><amt><qty>1</qty><unit>cup</unit></amt><item>tomatoes</item></ing> > <ing><amt><qty>1-2</qty><unit>tbsp</unit></amt><item>garlic</item></ing> > <ing><amt><qty>1/2</qty><unit>cup</unit></amt><item>Parmesan</item></ing> > <ing><amt><qty>1</qty><unit>jar</unit></amt><item>pesto</item></ing> > </ingredients>''' > pantry = set(['olive oil', 'pesto']) > tree = etree.fromstring(dinner_recipe) > for item_elem in tree.getiterator('item'): > if item_elem.text not in pantry: > print item_elem.text That's nice example. :) > Though I wouldn't know where to put the ElementSoup link in this one... I had a regular HTML in mind, something like: <code> # HTML page dinner_recipe = ''' <html><head><title>Recipe</title></head><body> <table> <tr><th>amt</th><th>unit</th><th>item</th></tr> <tr><td>24</td><td>slices</td><td>baguette</td></tr> <tr><td>2+</td><td>tbsp</td><td>olive_oil</td></tr> <tr><td>1</td><td>cup</td><td>tomatoes</td></tr> <tr><td>1-2</td><td>tbsp</td><td>garlic</td></tr> <tr><td>1/2</td><td>cup</td><td>Parmesan</td></tr> <tr><td>1</td><td>jar</td><td>pesto</td></tr> </table> </body></html>''' # program import xml.etree.ElementTree as etree tree = etree.fromstring(dinner_recipe) #import ElementSoup as etree # for invalid HTML #from cStringIO import StringIO # use this #tree = etree.parse(StringIO(dinner_recipe)) # wrapper for BeautifulSoup pantry = set(['olive oil', 'pesto']) for ingredient in tree.getiterator('tr'): amt, unit, item = ingredient.getchildren() if item.tag == "td" and item.text not in pantry: print "%s: %s %s" % (item.text, amt.text, unit.text) </code> But if that's too complicated I will not insist on this. :) Your example is good enough. -- Regards, Rob -- http://mail.python.org/mailman/listinfo/python-list