I would like to write a web (http) proxy which I can instrument to automatically extract information from certain web sites as I browse them. Specifically, I would want to process URLs that match a particular regexp. For those URLs I would have code that parsed the content and logged some of it.
Think of it as web scraping under manual control. I found this list of Python web proxies http://www.xhaus.com/alan/python/proxies.html Tiny HTTP Proxy in Python looks promising as it's nominally simple (not many lines of code) http://www.okisoft.co.jp/esc/python/proxy/ It does what it's supposed to, but I'm a bit at a loss as where to intercept the traffic. I suspect it should be quite straightforward, but I'm finding the code a bit opaque. Any suggestions? Andrew -- http://mail.python.org/mailman/listinfo/python-list