In <16828a11-6c7c-4ab6-b406-6b8819883...@googlegroups.com> darrel.rend...@gmail.com writes:
> def pageReader(url): > try: > readPage =3D urllib2.urlopen(url) > except urllib2.URLError, e: > # print 'We failed to reach a server.' > # print 'Reason: ', e.reason > return 404 =20 > except urllib2.HTTPError, e: > # print('The server couldn\'t fulfill the request.') > # print('Error code: ', e.code) =20 > return 404 =20 > else: > outputPage =3D readPage.read() =20 > return outputPage > To recreate my error, simply call the above function with an argument > similar to: > http://www.cert.org/nav/cert_announcements.rss > You'll see I'm trying to return the first child. The above code produces no output at all. The pageReader() function is defined but never called. If we add a few lines at the bottom: if __name__ == '__main__': print pageReader('http://www.cert.org/nav/cert_announcements.rss') Then we get some output: <?xml version="1.0" encoding="UTF-8"?> <rss version="2.0"> <channel> <title>CERT Announcements</title> <link>http://www.cert.org/nav/whatsnew.html</link> <language>en-us</language> <description>Announcements: What's New on the CERT web site</description> <item> <title>New Blog Entry: Common Sense Guide to Mitigating Insider Threats - Best Practice 16 (of 19)</title> <link>http://www.cert.org/blogs/insider_threat/2013/02/common_sense_guide_to_mitigating_insider_threats_-_best_practice_16_of_19.html</link> <description>This sixteenth of 19 blog posts about the fourth edition of the Common Sense Guide to Mitigating Insider Threats describes Practice 16: Develop a formalized insider threat program.</description> <pubDate>Wed, 06 Feb 2013 06:38:07 -0500</pubDate> </item> ... > As I've said, BeautifulSoup fails to find both pubDate and Link, which are = > crucial to my app. > Any advice would be greatly appreciated. You haven't included the BeautifulSoup code which attempts to parse the XML, so it's impossible to say exactly what the error is. However, I have a guess: you said you're trying to return the first child. Based on the above output, the first child is the <channel> element, not an <item> element. Perhaps that's the issue? -- John Gordon A is for Amy, who fell down the stairs gor...@panix.com B is for Basil, assaulted by bears -- Edward Gorey, "The Gashlycrumb Tinies" -- http://mail.python.org/mailman/listinfo/python-list