I haven't used Twill for screen scraping or parsing, but I do use it extensively for functional testing (and unit testing, actually) of web2py apps.
But for scraping, I could see how you could use Twill's python API to go to a page, login, and call show() to get back the html for a page (which you'd then pass to an XML parser (assuming that it's well- formed, or after passing it through some sort of tidy functionality)). So, assuming you have twill installed, from a web2py controller, you would do something like: from twill import * from twill.commands import * # Go to a url go('http://en.wikipedia.org/wiki/Web2py') # Use formvalue() and submit() functions to log in xhtml = show() # Capture the contents of the html page in a variable # Send the variable to a DOM parser, or use regexps, or whatever you like On Nov 11, 5:06 pm, David <digitalcry...@gmail.com> wrote: > Hey guys, > > I've been studying up on working with scraping/parsing and remote > logins for sites that don't have APIs and I came across Twill. > > Have any of you used it to automate things like login and screen/html > parsing? > > It would be nice to be able to login to a remote site via a model/ > controller and pull a small clip of html and stick it on a view > somewhere. > > I've got it working nicely on the shell and it seems quite promising > but it doesn't readily appear to me how I would use something like > this from inside web2py. > > Are there any examples that I can have a look at while I am still > learning about web2py? > > Thanks in advance! > > - David --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "web2py-users" group. To post to this group, send email to web2py@googlegroups.com To unsubscribe from this group, send email to web2py+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/web2py?hl=en -~----------~----~----~----~------~----~------~--~---