Hi George, Firefox lets you group the bookmarks along with other information into directories and sub-directories. Firefox uses header tags for this purpose. I'd like to get this grouping information out aswell.
Regards, Martin. the idea is to extract. George Sakkis wrote: > Francach wrote: > > George Sakkis wrote: > > > Francach wrote: > > > > Hi, > > > > > > > > I'm trying to use the Beautiful Soup package to parse through the > > > > "bookmarks.html" file which Firefox exports all your bookmarks into. > > > > I've been struggling with the documentation trying to figure out how to > > > > extract all the urls. Has anybody got a couple of longer examples using > > > > Beautiful Soup I could play around with? > > > > > > > > Thanks, > > > > Martin. > > > > > > from BeautifulSoup import BeautifulSoup > > > urls = [tag['href'] for tag in > > > BeautifulSoup(open('bookmarks.html')).findAll('a')] > > Hi, > > > > thanks for the helpful reply. > > I wanted to do two things - learn to use Beautiful Soup and bring out > > all the information > > in the bookmarks file to import into another application. So I need to > > be able to travel down the tree in the bookmarks file. bookmarks seems > > to use header tags which can then contain a tags where the href > > attributes are. What I don't understand is how to create objects which > > can then be used to return the information in the next level of the > > tree. > > > > Thanks again, > > Martin. > > I'm not sure I understand what you want to do. Originally you asked to > extract all urls and BeautifulSoup can do this for you in one line. Why > do you care about intermediate objects or if the anchor tags are nested > under header tags or not ? Read and embrace BeautifulSoup's philosophy: > "You didn't write that awful page. You're just trying to get some data > out of it. Right now, you don't really care what HTML is supposed to > look like." > > George -- http://mail.python.org/mailman/listinfo/python-list