Re: need start point for getting html info from web

Mike Meyer Sun, 30 Oct 2005 18:45:40 -0800

[EMAIL PROTECTED] writes:
> i have a small app that i am going to need to get information from a
> few tables on different websites. i have looked at urllib and httplib.
> the sites i need to get data from mostly have this data in tables. So
> that, i think would make it easier. Anyone suggest a good starting
> point for me to find out how to do this, or know of a link to a good
> how-to?


Don't have a link to a howto. But you're halfway there. urllib (and
urllib2) will get HTML text from the websites. Pulling data from it
sort of depends on the nature of the HTML. If it's well-structured
XHTML, you can use your favorite xml library. if it's well structured
HTML, you can try htmllib, but it's pretty primitive. If it's not
well-structured, you can use BeautifulSoup. I've used it to pull data
from tables. The problem with any of this is that your code really
depends on the structure - or lack thereof - of the HTML you're
scraping. If they change it, your code breaks.

          <mike
-- 
Mike Meyer <[EMAIL PROTECTED]>                  http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: need start point for getting html info from web

Reply via email to