Rob Wolfe wrote: > # HTML page > dinner_recipe = ''' > <html><head><title>Recipe</title></head><body> > <table> > <tr><th>amt</th><th>unit</th><th>item</th></tr> > <tr><td>24</td><td>slices</td><td>baguette</td></tr> > <tr><td>2+</td><td>tbsp</td><td>olive_oil</td></tr> > <tr><td>1</td><td>cup</td><td>tomatoes</td></tr> > <tr><td>1-2</td><td>tbsp</td><td>garlic</td></tr> > <tr><td>1/2</td><td>cup</td><td>Parmesan</td></tr> > <tr><td>1</td><td>jar</td><td>pesto</td></tr> > </table> > </body></html>''' > > # program > import xml.etree.ElementTree as etree > tree = etree.fromstring(dinner_recipe) > > #import ElementSoup as etree # for invalid HTML > #from cStringIO import StringIO # use this > #tree = etree.parse(StringIO(dinner_recipe)) # wrapper for BeautifulSoup > > pantry = set(['olive oil', 'pesto']) > > for ingredient in tree.getiterator('tr'): > amt, unit, item = ingredient.getchildren() > if item.tag == "td" and item.text not in pantry: > print "%s: %s %s" % (item.text, amt.text, unit.text)
I posted a slight variant of this, trimmed down a bit to 21 lines. STeVe -- http://mail.python.org/mailman/listinfo/python-list