Re: how to scrape url out of href

Mike Meyer Sun, 01 Jan 2006 17:00:47 -0800

[EMAIL PROTECTED] writes:
> i need to scrape a url out of an href.  it seems that people recommend
> that i use beautiful soup but had some problems.


What problem are you having with BeautifulSoup? It's working fine for
here.

> does anyone have sample code for scraping the actual url out of an href
> like this one
>
> <a href="http://www.cnn.com"; target="_blank">

The following fragment works fine for me:

        linktext = soup.fetchText('Next')
        if not linktext:
            return pages
        else:
            url = linktext[0].findParent('a')['href']


So you probably want something like:

   for anchor in soup.fetch('a', {'target': '_blank'}):
       print anchor['href']


       <mike

-- 
Mike Meyer <[EMAIL PROTECTED]>                  http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: how to scrape url out of href

Reply via email to