On Fri, Sep 11, 2009 at 11:09 AM, Chuck <galois...@gmail.com> wrote: > On Sep 11, 12:56 pm, Chuck <galois...@gmail.com> wrote: >> On Sep 11, 10:30 am, Falcolas <garri...@gmail.com> wrote: >> > On Sep 11, 8:20 am, Chuck <galois...@gmail.com> wrote: >> >> > > Hi all, >> >> > > I would like to code a simple podcast catcher in Python merely as an >> > > exercise in internet programming. I am a CS student and new to >> > > Python, but understand Java fairly well. I understand how to connect >> > > to a server with urlopen, but then I don't understand how to download >> > > the mp3, or whatever, podcast? Do I need to somehow parse the XML >> > > document? I really don't know. Any ideas? >> >> > > Thanks! >> >> > > Chuck >> >> > You will first have to download the RSS XML file, then parse that file >> > for the URL for the audio file itself. Something like eTree will help >> > immensely in this part. You'll also have to keep track of what you've >> > already downloaded. >> >> > I'd recommend taking a look at the RSS XML yourself, so you know what >> > it is you have to parse out, and where to find it. From there, it >> > should be fairly easy to come up with the proper query to pull it >> > automatically out of the XML. >> >> > As a kindness to the provider, I would recommend a fairly lengthy >> > sleep between GETs, particularly if you want to scrape their back >> > catalog. >> >> > Unfortunately, I no longer have the script I created to do just such a >> > thing in the past, but the process is rather straightforward, once you >> > know where to look. > > I am not sure how eTree fits in. Is that eTree.org?
No, he's referring to the `xml.etree.elementtree` standard module: http://docs.python.org/library/xml.etree.elementtree.html#module-xml.etree.ElementTree Although since you're dealing with feeds, you might be able to use Universal Feed Parser, which is specifically for RSS/Atom: http://www.feedparser.org/ Cheers, Chris -- http://blog.rebertia.com -- http://mail.python.org/mailman/listinfo/python-list