I did some testing, and calculating the hash value of a 1Gb file does take some time using this method. Would it be wise to calculate the hash value based on say for instance the first Mb? Is there a much larger chance of collusion this way (I suppose not). If it's helpful, the files would primarily be media (video) files.
Thanks, Mathieu -- http://mail.python.org/mailman/listinfo/python-list