Paul Rubin wrote: > Andrew McLean <[EMAIL PROTECTED]> writes: >> I would like to write a web (http) proxy which I can instrument to >> automatically extract information from certain web sites as I browse >> them. Specifically, I would want to process URLs that match a >> particular regexp. For those URLs I would have code that parsed the >> content and logged some of it. >> >> Think of it as web scraping under manual control. > > I've used Proxy 3 for this, a very cool program with powerful > capabilities for on the fly html rewriting. > > http://theory.stanford.edu/~amitp/proxy.html
This looks very useful. Unfortunately I can't seem to get it to run under Windows (specifically Vista) using Python 1.5.2, 2.2.3 or 2.5.2. I'll try Linux if I get a chance. -- http://mail.python.org/mailman/listinfo/python-list