This is in Python 2.3.5. I've had success with elementtree and other RSS feeds, but I can't get it to work with this format:
<?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:fr="http://ASPRSS.com/fr.html" xmlns:pa="http://ASPRSS.com/pa.html" xmlns="http://purl.org/rss/1.0/"> <channel rdf:about="http://www.sample.com"> <title>Example feed</title> <link>http://www.sample.com</link> <description>Sample News Agency - News Feed</description> <image rdf:resource="http://www.sample.com/img/new.gif" /> <items> <rdf:Seq> <rdf:li rdf:resource="http://www.sample.com/news/20000/news.htm" /> <rdf:li rdf:resource="http://www.sample.com/news/20001/news.htm" /> </rdf:Seq> </items> </channel><image rdf:about="http://www.sample.com/img/about.gif"> <title>Our News Feed</title> <url>http://www.sample.com/img/title.gif</url> <link>http://www.sample.com</link> </image> <item rdf:about="http://www.sample.com/news/20000/ news.htm"><title>First story</title> <description>30 August, 2007 : - - First description including unicode characters</description> <link>http://www.sample.com/news/20000/news.htm</link> </item> <item rdf:about="http://www.sample.com/news/20001/ news.htm"><title>Second story</title> <description>30 August, 2007 : - - Second description including unicode characters</description> <link>http://www.sample.com/news/20001/news.htm</link> </item> </rdf:RDF> What I want to extract is the text in the title and link tags for each item (eg. <title>First story</title> and <link>http://www.sample.com/ news/20000/news.htm</link>). Starting with the title, my test script is: import sys from urllib import urlopen sys.path.append("/home/me/lib/python") import elementtree.ElementTree as ET news = urlopen("http://www.sample.com/rss/rss.xml") nTree = ET.parse(news) for item in nTree.getiterator("title"): print item.text Whether I try this for title or link, nothing is printed. There are also unicode characters in the <title> tags, I'm not sure if that could affect the output like this. In case it did I passed an encoding argument to ET.parse (which I'd seen in other posts) but it said encoding was an unexpected argument... Printing all subelements does work: print nTree.getiterator() [<Element {http://www.w3.org/1999/02/22-rdf-syntax-ns#}RDF at 40436d2c>, <Element {http://purl.org/rss/1.0/}channel at 40436b2c>, <Element {http://purl.org/rss/ 1.0/}title at 40436dcc>, <Element {http://purl.org/rss/1.0/}link at 40436d6c>, < Element {http://purl.org/rss/1.0/}description at 40436e0c>, <Element {http://pur l.org/rss/1.0/}image at 40436e6c>, <Element {http://purl.org/rss/1.0/}items at 4 0436f2c>, <Element {http://www.w3.org/1999/02/22-rdf-syntax-ns#}Seq at 40436f6c> , <Element {http://www.w3.org/1999/02/22-rdf-syntax-ns#}li at 40436f0c>, <Element {http://www.w3.org/1999/02/22-rdf-syntax-ns#}li at 40436fec>, <Element {http://purl.org/rss /1.0/}item at 4044624c>, <Element {http://purl.org/rss/1.0/}title at 4044626c>, <Element {http://purl.org/rss/1.0/}description at 4044614c>, <Element {http://purl.org/rss/1.0/}link at 4044630c>, <Element {http://purl.org/rss/1.0/}item at 40 4463ac>, <Element {http://purl.org/rss/1.0/}title at 404463cc>, <Element {http:/ /purl.org/rss/1.0/}description at 404462ac>, <Element {http://purl.org/rss/1.0/} link at 4044640c>] Any ideas are greatly appreciated. -- http://mail.python.org/mailman/listinfo/python-list