Paul Rubin wrote: > Claudio Grondi <[EMAIL PROTECTED]> writes: > >>>Try the standard Unix/Linux sort utility. Use the --buffer-size=SIZE >>>to tell it how much memory to use. >> >>I am on Windows and it seems, that Windows XP SP2 'sort' can work with >>the file, but not without a temporary file and space for the resulting >>file, so triple of space of the file to sort must be provided. > > > Oh, sorry, I didn't see the earlier parts of the thread. Anyway, > depending on the application, it's probably not worth the hassle of > coding something yourself, instead of just throwing more disk space at > the Unix utility. But especially if the fields are fixed size, you > could just mmap the file and then do quicksort on disk. Simplest > would be to just let the OS paging system take care of caching stuff > if you wanted to get fancy, you could sort in memory once the sorting > regions got below a certain size. > > A huge amount of stuff has been written (e.g. about half of Knuth vol > 3) about how to sort. Remember too, that traditionally large-scale > sorting was done on serial media like tape drives, so random access > isn't that vital.
Does it mean, that in case of very large files: the size of available memory for the sorting operation (making it possible to work on larger chunks of data in memory) has less impact on the actual sorting speed than the speed of the data transfer from/to storage device(s) ? So, that the most effective measure towards shortening the time required for sorting very large files were to use faster hard drives (e.g. 10.000 rpm instead of 7.600 rpm) and faster interfaces for the data transfer (e.g. E-IDE or S-ATA instead of USB), right? Claudio Grondi -- http://mail.python.org/mailman/listinfo/python-list