Hi - I'm trying to calculate unique hash values for binary files, independent of their location and filename, and I was wondering whether I'm going in the right direction.
Basically, the hash values are calculated thusly: f = open('binaryfile.bin') import hashlib h = hashlib.sha1() h.update(f.read()) hash = h.hexdigest() f.close() A quick try-out shows that effectively, after renaming a file, its hash remains the same as it was before. I have my doubts however as to the usefulness of this. As f.read() does not seem to read until the end of the file (for a 3.3MB file only a string of 639 bytes is being returned, perhaps a 00-byte counts as EOF?), is there a high danger for collusion? Are there better ways of calculating hash values of binary files? Thanks in advance, Mathieu -- http://mail.python.org/mailman/listinfo/python-list