harryos, 09.10.2010 14:24:
I am trying to determine if a wep page is updated by x number of characters..Mozilla firefox plugin 'update scanner' has a similar functionality ..A user can specify the x ..I think this would be done by reading from the same url at two different times and finding the change in body text.
"Number of characters" sounds like a rather useless measure here. I'd rather apply an XPath, CSS selector or PyQuery expression to the parsed page and check if the interesting subtree of it has changed at all or not, potentially disregarding any structural changes by stripping all tags and normalising the resulting text to ignore whitespace and case differences.
Stefan -- http://mail.python.org/mailman/listinfo/python-list