andrea wrote:
On 31 Mar, 12:14, "venutaurus...@gmail.com" <venutaurus...@gmail.com>
wrote:
That time is reasonable. The randomness should be in such a way that
MD5 checksum of no two files should be the same.The main reason for
having such a huge data is for doing stress testing of our product.
In randomness is not necessary (as I understood) you can just create
one single file and then modify one bit of it iteratively for 1000
times.
It's enough to make the checksum change.
Is there a way to create a file to big withouth actually writing
anything in python (just give me the garbage that is already on the
disk)?
Not exactly AFAIK, but this line of thinking does remind me of
sparse files[1] if your filesystem supports them:
f = file('%i.txt' % i, 'wb')
data = str(i) + '\n'
f.seek(1024*1024*1024 - len(data))
f.write(data)
f.close()
On FS's that support sparse files, it's blindingly fast and
creates a virtual file of that size without the overhead of
writing all the bits to the file. However, this same
optimization may also throw off any benchmarking you do, as it
doesn't have to read a gig off the physical media. This may be a
good metric for hash calculation across such files, but not a
good metric for I/O.
-tkc
[1]
http://en.wikipedia.org/wiki/Sparse_file
--
http://mail.python.org/mailman/listinfo/python-list