"Benji99" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
>
> Basically, I'm getting a htmlsource from a URL and need to
> a.) find specific URLs
> b.) find specific data
> c.) with specific URLs, load new html pages and repeat.
>
<snip>
>
> Basically, I want to search through the whole string(
> htmlSource), for a specific keyword, when it's found, I want to
> know which line it's on so that I can retrieve that line and
> then I should be able to parse/extract what I need using Regular
> Expressions (which I'm getting quite confortable with). So how
> can this be accomplished?
>
If you download pyparsing (at http://pyparsing.sourceforge.net), you'll find
in the examples something very close to this called urlextractor.py (lists
out all href's and their associated links on the page at www.yahoo.com).

-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to