In message <[EMAIL PROTECTED]>, Abandoned
wrote:

> I want to a idea for how can i find duplicate pages quickly and fast ?

Compute a hash based on a canonicalized version of the content? Disregard
white space, line wrap, upper/lower case, possibly even punctuation etc so
that you get the same hash in spite of these differences.
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to