On 7 Sep 2006 14:30:25 -0700, Adam Jones <[EMAIL PROTECTED]> wrote: > > Francach wrote: > > Hi, > > > > I'm trying to use the Beautiful Soup package to parse through the > > "bookmarks.html" file which Firefox exports all your bookmarks into. > > I've been struggling with the documentation trying to figure out how to > > extract all the urls. Has anybody got a couple of longer examples using > > Beautiful Soup I could play around with? > > > > Thanks, > > Martin. > > If the only thing you want out of the document is the URL's why not > search for: href="..." ? You could get a regular expression that > matches that pretty easily. I think this should just about get you > there, but my regular expressions have gotten very rusty. > > /href=\".+\"/ >
I doubt the bookmarks file is huge so something simple like f = open('bookmarks.html').readlines() data = [x for x in f if x.strip().startswith('<DT><A ')] would get you started. On my exported firefox bookmarks, this gives me all the urls, they just need to be parsed a bit more accurately, I might be tempted to just use a couple of splits() to keep it real simple. HTH -- Tim Williams -- http://mail.python.org/mailman/listinfo/python-list