On Jan 14, 9:59 am, Shriphani <[EMAIL PROTECTED]> wrote: > Hello, > I have a html file over here by the name guide_ind.html and it > contains links to other html files like guides.html#outline . How do I > point BeautifulSoup (I want to use this module) to > guides.html#outline ? > Thanks > Shriphani P.
Try Mark Pilgrim's excellent example at: http://www.diveintopython.org/http_web_services/index.html >From the above link, you can retrieve openanything.py which I use in my example: # list_url.py # created by Hai Vu on 1/16/2008 from openanything import fetch from sgmllib import SGMLParser class RetrieveURLs(SGMLParser): def reset(self): SGMLParser.reset(self) self.urls = [] def start_a(self, attributes): url = [v for k, v in attributes if k.lower() == 'href'] self.urls.extend(url) print '\t%s' % (url) # -------------------------------------------------------------------------------------------------------------- # main def main(): site = 'http://www.google.com' result = fetch(site) if result['status'] == 200: # Extracts a list of URLs off the top page parser = RetrieveURLs() parser.feed(result['data']) parser.close() # Display the URLs we just retrieved print '\nURL retrieved from %s' % (site) print '\t' + '\n\t'.join(parser.urls) else: print 'Error (%d) retrieving %s' % (result['status'], site) if __name__ == '__main__': main() -- http://mail.python.org/mailman/listinfo/python-list