Robert Brown wrote: > > But how do they compare when the heash is too big to fit in main > memory? If the has starts swapping, you loose! I do not know, > however, whether using a database based hash would be faster or slower > than the sort -u approach. It would make for an interesting test. > Try sorting a file with over 3E8 unique integers, or maybe just a file > with 256 byte records, and enough unique records to not fit in memory.
I would say here that at least part of the issue is designing around the need for swollen data sets. It is true that hashes get their speed at the cost of memory allocations, since the structures perform best when sparsely populated. There will cetainly be cases where the sheer immensity of a data set may call for a more memory-efficient storage structure, at the cost of vastly increased [O(log n) for inserts and deletions,O( n log n) for sorts versus O(1) for hash operations, with occasioanl overhead from resizing and re-hashing] process times. I tend to think that, were I trying to handle voluminous quantities of data, I might want to call a C-coded B-Tree for storage purposes. I see these kind of tasks as more exception cases calling for special handling than a standard to dictate normal coding practices, though. Most CPUs in use average about 99% idle time, at least on the computers [some running up to 20 open windows] on which I have checked these stats. To me, the more important issues in normal practice have to do with comprehensibility, and thus maintainability, than with technical performance comparisons. Joseph -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>