Fredrik Lundh <[EMAIL PROTECTED]> wrote: > on my machine, Python's md5+mmap is a little bit faster than > subprocess+md5sum: > > import os, md5, mmap > > file = open(fn, "r+") > size = os.path.getsize(fn) > hash = md5.md5(mmap.mmap(file.fileno(), size)).hexdigest() > > (I suspect that md5sum also uses mmap, so the difference is > probably just the subprocess overhead)
But you won't be able to md5sum a file bigger than about 4 Gb if using a 32bit processor (like x86) will you? (I don't know how the kernel / user space VM split works on windows but on linux 3Gb is the maximum possible size you can mmap.) $ dd if=/dev/zero of=z count=1 bs=1048576 seek=8192 $ ls -l z -rw-r--r-- 1 ncw ncw 8590983168 Feb 9 09:26 z >>> fn="z" >>> import os, md5, mmap >>> file = open(fn, "rb") >>> size = os.path.getsize(fn) >>> size 8590983168L >>> hash = md5.md5(mmap.mmap(file.fileno(), size)).hexdigest() Traceback (most recent call last): File "<stdin>", line 1, in ? OverflowError: memory mapped size is too large (limited by C int) >>> -- Nick Craig-Wood <[EMAIL PROTECTED]> -- http://www.craig-wood.com/nick -- http://mail.python.org/mailman/listinfo/python-list