Am 06.07.2011 07:54 schrieb Phlip:
Pythonistas:
Consider this hashing code:
import hashlib
file = open(path)
m = hashlib.md5()
m.update(file.read())
digest = m.hexdigest()
file.close()
If the file were huge, the file.read() would allocate a big string and
thrash memory. (Yes, in 2011 that's still a problem, because these
files could be movies and whatnot.)
So if I do the stream trick - read one byte, update one byte, in a
loop, then I'm essentially dragging that movie thru 8 bits of a 64 bit
CPU. So that's the same problem; it would still be slow.
Yes. That is why you should read with a reasonable block size. Not too
small and not too big.
def filechunks(f, size=8192):
while True:
s = f.read(size)
if not s: break
yield s
# f.close() # maybe...
import hashlib
file = open(path)
m = hashlib.md5()
fc = filechunks(file)
for chunk in fc:
m.update(chunk)
digest = m.hexdigest()
file.close()
So you are reading in 8 kiB chunks. Feel free to modify this - maybe use
os.stat(file).st_blksize instead (which is AFAIK the recommended
minimum), or a value of about 1 MiB...
So now I try this:
sum = os.popen('sha256sum %r' % path).read()
This is not as nice as the above, especially not with a path containing
strange characters. What about, at least,
def shellquote(*strs):
return " ".join([
"'"+st.replace("'","'\\''")+"'"
for st in strs
])
sum = os.popen('sha256sum %r' % shellquote(path)).read()
or, even better,
import subprocess
sp = subprocess.Popen(['sha256sum', path'],
stdin=subprocess.PIPE, stdout=subprocess.PIPE)
sp.stdin.close() # generate EOF
sum = sp.stdout.read()
sp.wait()
?
Does hashlib have a file-ready mode, to hide the streaming inside some
clever DMA operations?
AFAIK not.
Thomas
--
http://mail.python.org/mailman/listinfo/python-list