On Feb 17, 6:55 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > I'm trying to parse out some XML nodes with namespaces using > BeautifulSoup. I can't seem to get the syntax correct. It doesn't like > the colon in the tag name, and I'm not sure how to refer to that tag. > > I'm trying to get the attributes of this tag: > > <yweather:forecast day="Sun" date="18 Feb 2007" low="39" high="55" > text="Partly Cloudy/Wind" code="24"> > > The only way I've been able to get it is by doing a findAll with > regex. Is there a better way? > > ---------- > > from BeautifulSoup import BeautifulStoneSoup > import urllib2 > > url = 'http://weather.yahooapis.com/forecastrss?p=33609' > page = urllib2.urlopen(url) > soup = BeautifulStoneSoup(page) > > print soup['yweather:forecast'] > > ----------
If you are just trying to extract a single particular tag, pyparsing can do this pretty readily, and the results returned make it very easy to pick out the tag attribute values. -- Paul from pyparsing import makeHTMLTags import urllib2 url = 'http://weather.yahooapis.com/forecastrss?p=78732' page = urllib2.urlopen(url) html = page.read() page.close() forecastTag = makeHTMLTags('yweather:forecast')[0] for fc in forecastTag.searchString(html): print fc.asList() print "Date: %(date)s, hi:%(high)s lo:%(low)s" % fc print Prints: ['yweather:forecast', ['day', 'Sat'], ['date', '17 Feb 2007'], ['low', '34'], ['high', '67'], ['text', 'Clear'], ['code', '31'], True] Date: 17 Feb 2007, hi:67 lo:34 ['yweather:forecast', ['day', 'Sun'], ['date', '18 Feb 2007'], ['low', '42'], ['high', '65'], ['text', 'Sunny'], ['code', '32'], True] Date: 18 Feb 2007, hi:65 lo:42 -- http://mail.python.org/mailman/listinfo/python-list