Claudio Grondi <[EMAIL PROTECTED]> writes: > >>The Windows XP SP 2 '/> sort' (sorting of four Gigs of 20 byte records > >>took 12 CPU and 18 usual hours).... > Ok, I see - the misunderstanding is, that there were 4.294.967.296 > records each 20 bytes long, what makes the actual file 85.899.345.920 > bytes large (I just used 'Gigs' for telling the number of records, not > the size of the file). > Still not acceptable sorting time?
I think that's not so bad, though probably still not optimal. 85 GB divided by 18 hours is 1.3 MB/sec, which means if the program is reading the file 8 times, it's getting 10 MB/sec through the Windows file system, which is fairly reasonable throughput. If you know something about the distribution of the data (e.g. the records are random 20-byte hashes) you might be able to sort it in essentially linear time (radix sorting). But even with a general purpose algorithm, if you have a few hundred MB of ram and some scratch disk space to work with, you should be able to sort that much data in much less than 18 hours. But if you only had to do it once and it's finished now, why do you still care how long it took? -- http://mail.python.org/mailman/listinfo/python-list