venutaurus...@gmail.com wrote:
On Mar 31, 1:15 pm, Steven D'Aprano
<ste...@remove.this.cybersource.com.au> wrote:
On Mon, 30 Mar 2009 22:44:41 -0700, venutaurus...@gmail.com wrote:
Hello all,
I've a requirement where I need to create around 1000
files under a given folder with each file size of around 1GB. The
constraints here are each file should have random data and no two files
should be unique even if I run the same script multiple times.
I don't understand what you mean. "No two files should be unique" means
literally that only *one* file is unique, the others are copies of each
other.
Do you mean that no two files should be the same?
Moreover
the filenames should also be unique every time I run the script. One
possibility is that we can use Unix time format for the file names
with some extensions.
That's easy. Start a counter at 0, and every time you create a new file,
name the file by that counter, then increase the counter by one.
Can this be done within few minutes of time. Is it
possble only using threads or can be done in any other way. This has to
be done in Windows.
Is it possible? Sure. In a couple of minutes? I doubt it. 1000 files of
1GB each means you are writing 1TB of data to a HDD. The fastest HDDs can
reach about 125 MB per second under ideal circumstances, so that will
take at least 8 seconds per 1GB file or 8000 seconds in total. If you try
to write them all in parallel, you'll probably just make the HDD waste
time seeking backwards and forwards from one place to another.
--
Steven
That time is reasonable. The randomness should be in such a way that
MD5 checksum of no two files should be the same.The main reason for
having such a huge data is for doing stress testing of our product.
Does it really need to be *files* on the *hard disk*?
What nobody has suggested yet is that you can *simulate* the files by making a large set
of custom file-like object and feed that to your application. (If possible!)
The object could return a 1 GB byte stream consisting of a GUID followed by
random bytes
(or just millions of A's, because you write that the only requirement is to have a
different MD5 checksum).
That way you have no need of a 1 terabyte hard drive and the huge wait time to create
the actual files...
--irmen
--
http://mail.python.org/mailman/listinfo/python-list