I've an application where I need to create 50K files spread
uniformly across 50 folders in python. The content can be the name of
file itself repeated 10 times.I wrote a code using normal for loops
but it is taking hours together for that. Can some one please share
the code for it using Multithreading. As am new to Python, I know
little about threading concepts.
I'm not sure multi-threading will assist much with what appears
to be a filesystem & I/O bottleneck. Far more important will be
factors like
- what OS are you running?
- what filesystem are you writing to and does it journal? (NTFS?
FAT? ext[234]? JFS? XFS? tempfs/tmpfs? etc...) Some
filesystems deal much better with lots of small files while
others perform better with fewer-but-larger files.
- how is that filesystem mounted? Are syncs forced for all
writes? How large are your write caches?
- what sort of hardware is it hitting? A 4500 RPM drive will
have far worse performance than a 10k RPM drive; or you could try
against a flash drive or RAID-0 which should have higher
sustained write throughput.
- how sparse is the drive layout beforehand? If you're cramped
for space, the FS drivers have to scatter your files across
disparate space; while lots of free-space may allow it to spew
the files without great consideration.
Hope this gives you some ways to test your configuration...
-tkc
--
http://mail.python.org/mailman/listinfo/python-list