I do not really know, what you want to do. Getting he urls from the a tags of a html file? I think the easiest method would be a regular expression.
>>>import urllib, sre >>>html = urllib.urlopen("http://www.google.com").read() >>>sre.findall('href="([^>]+)"', html) ['/imghp?hl=de&tab=wi&ie=UTF-8', 'http://groups.google.de/grphp?hl=de&tab=wg&ie=UTF-8', '/dirhp?hl=de&tab=wd&ie=UTF-8', 'http://news.google.de/nwshp?hl=de&tab=wn&ie=UTF-8', 'http://froogle.google.de/frghp?hl=de&tab=wf&ie=UTF-8', '/intl/de/options/'] >>> sre.findall('href=[^>]+>([^<]+)</a>', html) ['Bilder', 'Groups', 'Verzeichnis', 'News', 'Froogle', 'Mehr »', 'Erweiterte Suche', 'Einstellungen', 'Sprachtools', 'Werbung', 'Unternehmensangebote', 'Alles \xfcber Google', 'Google.com in English'] Google has some strange html, href without quotation marks: <a href=http://www.google.com/ncr>Google.com in English</a> -- http://mail.python.org/mailman/listinfo/python-list