Rob Wolfe wrote: > Steve Howell wrote: > >> I suggested earlier that maybe we post multiple >> solutions. That makes me a little nervous, to the >> extent that it shows that the Python community has a >> hard time coming to consensus on tools sometimes. > > We agree that BeautifulSoup is the best for parsing HTML. :) > >> This is not a completely unfair knock on Python, >> although I think the reason multiple solutions tend to >> emerge for this type of thing is precisely due to the >> simplicity and power of the language itself. >> >> So I don't know. What about trying to agree on an XML >> parsing example instead? >> >> Thoughts? > > I vote for example with ElementTree (without xpath) > with a mention of using ElementSoup for invalid HTML.
Sounds good to me. Maybe something like:: import xml.etree.ElementTree as etree dinner_recipe = ''' <ingredients> <ing><amt><qty>24</qty><unit>slices</unit></amt><item>baguette</item></ing> <ing><amt><qty>2+</qty><unit>tbsp</unit></amt><item>olive_oil</item></ing> <ing><amt><qty>1</qty><unit>cup</unit></amt><item>tomatoes</item></ing> <ing><amt><qty>1-2</qty><unit>tbsp</unit></amt><item>garlic</item></ing> <ing><amt><qty>1/2</qty><unit>cup</unit></amt><item>Parmesan</item></ing> <ing><amt><qty>1</qty><unit>jar</unit></amt><item>pesto</item></ing> </ingredients>''' pantry = set(['olive oil', 'pesto']) tree = etree.fromstring(dinner_recipe) for item_elem in tree.getiterator('item'): if item_elem.text not in pantry: print item_elem.text Though I wouldn't know where to put the ElementSoup link in this one... STeVe -- http://mail.python.org/mailman/listinfo/python-list