On 3/3/08, Mike D <[EMAIL PROTECTED]> wrote: > > Hello, > > I'm using XML Reader (xml.sax.xmlreader.XMLReader) to create an rss > reader. > > I can parse the file but am unsure how to extract the elements I require. > For example: For each <item> element I want the title and description. > > I have some stub code; I want to create a list of objects which include a > title and description. > > I have the following code (a bit hacked up): > > import sys > from xml.sax import make_parser > from xml.sax import handler > > class rssObject(object): > objectList=[] > def addObject(self,object): > rssObject.objectList.append(object) > > class rssObjectDetail(object): > title = "" > content = "" > > > class SimpleHandler(handler.ContentHandler): > def startElement(self,name,attrs): > print name > > def endElement(self,name): > print name > > def characters(self,data): > print data > > > class SimpleDTDHandler(handler.DTDHandler): > def notationDecl(self,name,publicid,systemid): > print "Notation: " , name, publicid, systemid > > def unparsedEntityDecl(self,name,publicid,systemid): > print "UnparsedEntity: " , name, publicid, systemid, ndata > > p= make_parser() > c = SimpleHandler() > p.setContentHandler(c) > p.setDTDHandler(SimpleDTDHandler()) > p.parse('topstories.xml') > > And am using this xml file: > > <?xml version="1.0"?> > <rss version="2.0"> > <channel> > <title>Stuff.co.nz - Top Stories</title> > <link>http://www.stuff.co.nz</link> > <description>Top Stories from Stuff.co.nz. New Zealand, world, sport, > business & entertainment news on Stuff.co.nz. </description> > <language>en-nz</language> > <copyright>Fairfax New Zealand Ltd.</copyright> > <ttl>30</ttl> > <image> > <url>/static/images/logo.gif</url> > <title>Stuff News</title> > <link>http://www.stuff.co.nz</link> > </image> > > <item id="4423924" count="1"> > <title>Prince Harry 'wants to live in Africa'</title> > <link>http://www.stuff.co.nz/4423924a10.html?source=RSStopstories_20080303 > </link> > <description>For Prince Harry it must be the ultimate dark irony: to be in > such a privileged position and have so much opportunity, and yet be unable > to fulfil a dream of fighting for the motherland.</description> > <author>EDMUND TADROS</author> > <guid isPermaLink="false">stuff.co.nz/4423924</guid> > <pubDate>Mon, 03 Mar 2008 00:44:00 GMT</pubDate> > </item> > > </channel> > </rss> > > Is there something I'm missing? I can't figure out how to correctly > interpret the document using the SAX parser. I'm sure I;'m missing something > obvious :) > > Any tips or advice would be appreciated! Also advice on correctly > implementing what I want to achieve would be appreciated as using > objectList=[] in the ContentHandler seems like a hack. > > Thanks! >
My mistake, The provided example is a SAX object, which can be parsed with DOM manipulation. I'll be able to do it now :) Oh, I also posted a hacked up implementation, I understand my classes look awful!
-- http://mail.python.org/mailman/listinfo/python-list