"Fuzzyman" <[EMAIL PROTECTED]> writes: > Ajar wrote: >> I want to write a program which will automatically login to my ISPs >> website, retrieve data and do some processing. Can this be done? Can >> you point me to any example python programs which do similar things? >> >> Regards, >> Ajar > > Very easily. Have a look at my article on the ``urllib2`` module. > > http://www.voidspace.org.uk/python/articles.shtml#http > > You may need to use ClientCookie/cookielib to handle cookies and may > have to cope with BASIC authentication. There are also articles about > both of these as well. > > If you want to handle filling in forms programattically then the module > ClientForm is useful (allegedly).
The last piece of the puzzle is BeautifulSoup. That's what you use to extract data from the web page. For instance a lot of web pages listing data have something like this on it: <table> ... <tr><th>Item:</th><td>Value</td></tr> ... </table> You can extract value from such with BeautifulSoup by doing something like: soup.fetchText('Item:')[0].findParent(['td', 'th']).nextSibling.string Where this checks works for the item being in either a td or th tag. Of course, I recommend doing things a little bit more verbosely. In my case, I'm writing code that's expected to work on a large number of web pages with different formats, so I put in a lot of error checking, along with informative errors. links = table.fetchText(name) if not links: raise BadTableMatch, "%s not found in table" % name td = links[0].findParent(['td', 'th']) if not td: raise BadmatchTable, "td/th not a parent of %s" % name next = td.nextSibling if not next: raise BadTableMatch, "td for %s has no sibling" % name out = get_contents(next) if not out: raise BadTableMatch, "no value string found for %s" % name return out BeautifulSoup would raise exceptions if the conditions I check for are true and I didn't check them - but the error messages wouldn't be as informative. Oh yeah - get_contents isn't from BeautifulSoup. I ran into cases where the <td> tag held other tags, and wanted the flat text extracted. Couldn't find a BeautifulSoup method to do that, so I wrote: def get_contents(ele): """Utility function to return all the text in a tag.""" if ele.string: return ele.string # We only have one string. Done return ''.join(get_contents(x) for x in ele) <mike -- Mike Meyer <[EMAIL PROTECTED]> http://www.mired.org/home/mwm/ Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information. -- http://mail.python.org/mailman/listinfo/python-list