[EMAIL PROTECTED] wrote: > i have some html which looks like this where i want to scrape out the > href stuff (the www.cnn.com part) > > <div class="noFood">Cheese</div> > <div class="food">Blue</div> > <a class="btn" href = "http://www.cnn.com"> > > > so i wrote this code which scrapes it perfectly: > > for incident in row('div', {'class':'noFood'}): > b = incident.findNextSibling('div', {'class': 'food'}) > print b > n = b.findNextSibling('a', {'class': 'btn'}) > print n > link = n['href'] + "','" > > problem is that sometimes the 2nd tag , the <div class="food"> tag , is > sometimes called food, sometimes called drink.
Apparently you are using Beautiful Soup. The value in the attribute dictionary can be a callable; try this: def isFoodOrDrink(attr): return attr in ['food', 'drink'] b = incident.findNextSibling('div', {'class': isFoodOrDrink}) Alternately you could omit the class spec and check for it in code. Kent -- http://mail.python.org/mailman/listinfo/python-list