Re: sorter script with MANY unique records

R. Joseph Newton Wed, 10 Dec 2003 17:02:46 -0800

Robert Brown wrote:

>
> But how do they compare when the heash is too big to fit in main
> memory?  If the has starts swapping, you loose!  I do not know,
> however, whether using a database based hash would be faster or slower
> than the sort -u approach.  It would make for an interesting test.
> Try sorting a file with over 3E8 unique integers, or maybe just a file
> with 256 byte records, and enough unique records to not fit in memory.


I would say here that at least part of the issue is designing around the need for
swollen data sets.  It is true that hashes get their speed at the cost of memory
allocations, since the structures perform best when sparsely populated.  There will
cetainly be cases where the sheer immensity of a data set may call for a more
memory-efficient storage structure, at the cost of vastly increased [O(log n) for
inserts and deletions,O( n log n) for sorts versus O(1) for hash operations, with
occasioanl overhead from resizing and re-hashing] process times.  I tend to think
that, were I trying to handle voluminous quantities of data, I might want to call a
C-coded B-Tree for storage purposes.  I see these kind of tasks as more exception
cases calling for special handling than a standard to dictate normal coding
practices, though.

Most CPUs in use average about 99% idle time, at least on the computers [some
running up to 20 open windows] on which I have checked these stats.  To me, the
more important issues in normal practice have to do with comprehensibility, and
thus maintainability, than with technical performance comparisons.

Joseph


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: sorter script with *MANY* unique records

Reply via email to

Re: sorter script with MANY unique records