Use a priority queue for the sorted sub-lists.
When the key-object extracted from the head of the smallest queue exceeds the key-object
from the head of the second queue, adjust the priority of the smallest queue
within the list of queues. It uses a total of 2 read/write passes
over the data, no matter how many subfiles you have. It is dominatingly faster
than any other sort of external merge when you have lots of subfiles. I posted some message to the list on this
subject before, and gave a pointer to sample code that demonstrates the
concept. If you have one million sub-files, it
still only takes a total of two read-write passes. From: On 3/7/06,
|
- Re: [HACKERS] Merge algorithms for large numbers of "ta... Dann Corbit
- Re: [HACKERS] Merge algorithms for large numbers of &qu... Luke Lonergan
- Re: [HACKERS] Merge algorithms for large numbers of... Tom Lane
- Re: [HACKERS] Merge algorithms for large number... Luke Lonergan
- Re: [HACKERS] Merge algorithms for large number... Luke Lonergan
- Re: [HACKERS] Merge algorithms for large numbers of &qu... Greg Stark
- Re: [HACKERS] Merge algorithms for large numbers of &qu... Simon Riggs
- Re: [HACKERS] Merge algorithms for large numbers of... Tom Lane
- Re: [HACKERS] Merge algorithms for large number... Luke Lonergan
- Re: [HACKERS] Merge algorithms for large nu... Jim C. Nasby