I haven't used Twill for screen scraping or parsing, but I do use it
extensively for functional testing (and unit testing, actually) of
web2py apps.

But for scraping, I could see how you could use Twill's python API to
go to a page, login, and call show() to get back the html for a page
(which you'd then pass to an XML parser (assuming that it's well-
formed, or after passing it through some sort of tidy functionality)).

So, assuming you have twill installed, from a web2py controller, you
would do something like:

from twill import *
from twill.commands import *

# Go to a url
go('http://en.wikipedia.org/wiki/Web2py')

# Use formvalue() and submit() functions to log in

xhtml = show()   # Capture the contents of the html page in a variable

# Send the variable to a DOM parser, or use regexps, or whatever you
like


On Nov 11, 5:06 pm, David <digitalcry...@gmail.com> wrote:
> Hey guys,
>
> I've been studying up on working with scraping/parsing and remote
> logins for sites that don't have APIs and I came across Twill.
>
> Have any of you used it to automate things like login and screen/html
> parsing?
>
> It would be nice to be able to login to a remote site via a model/
> controller and pull a small clip of html and stick it on a view
> somewhere.
>
> I've got it working nicely on the shell and it seems quite promising
> but it doesn't readily appear to me how I would use something like
> this from inside web2py.
>
> Are there any examples that I can have a look at while I am still
> learning about web2py?
>
> Thanks in advance!
>
> - David
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To post to this group, send email to web2py@googlegroups.com
To unsubscribe from this group, send email to 
web2py+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/web2py?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to