Den 27/09/2010 kl. 19.02 skrev harryos: > thanks for the pointer > I am trying to get something similar to changedetection but with > hourly updates. > I need to get updates from a number of sites..So I was wondering how > to implement an updating utility
You could also try looking at the HTTP headers for a request for e.g. "index.htm" using urllib. Specifically the "Expires" and "Last-Modified". This would let you ignore e.g. banners and flash content etc. as they are fetched in separate requests. If you want to go really lightweight and fast, do a HEAD request instead of a plain GET. It's easy to look at the headers a specific site is sending with e.g. the Firebug plugin for Firefox. Using headers values requires that you can trust the site on the header content. Web servers and caching proxies can do all sorts of things with the headers. Otherwise, saving the hash of the raw HTML (without GIFs etc.) as suggested is a good approach. Depending on what your definition of "updated" is. King regards, Erik
smime.p7s
Description: S/MIME cryptographic signature