Le 01/09/2021 à 03:58, Micah Kornfield a écrit :
According to Wikipedia there is a min-heap approach that is O(N log k) not
sure if this matches with Niranda's proposal [1]. On the surface the
analysis make sense to me but I could be missing something.
[1] https://en.m.wikipedia.org/wiki/K-way_merge_algorithm
My bad. Thanks for the pointer.
Regards
Antoine.
On Tuesday, August 31, 2021, Antoine Pitrou <anto...@python.org> wrote:
On Tue, 31 Aug 2021 17:47:02 -0400
Niranda Perera <niranda.per...@gmail.com> wrote:
@antoine thanks for the heads up.
This was the example. I want to merge this table into a final table
+-----------+--+-----------+
| Table 0 | | Table 1 |
+---+---+---+ +---+---+---+
| | A | B | | | A | B |
+---+---+---+ +---+---+---+
| 0 | 3 | x | | 0 | 1 | x |
+---+---+---+ +---+---+---+
| 1 | 3 | x | | 1 | 2 | x |
+---+---+---+ +---+---+---+
| 2 | 4 | x | | 2 | 2 | x |
+---+---+---+ +---+---+---+
| 3 | 5 | x | | 3 | 5 | x |
+---+---+---+--+---+---+---+
I agree. For the 2 tables case, we don't need a min-heap. But I think the
complexity of the min-heap approach would be O(N log K) for K tables.
The min-heap approach you're proposing looks like a regular heap sort,
and so would be O(N log N) (it doesn't take advantage of the existing
order). Besides, heap sort is not stable.
The pairwise merge approach should O(N log K), however.
(of course, if N is small-ish this may not be a problem for you)
I looked at the chunked array index sort impl. If I want to create a
final
table, I'd still have to use `sort_index` followed by `take`, wouldn't
it?
If you use `sort_indices`, you'll have the entire sort. If you want to
take advantage of each chunk being ordered, you should extract or
rewrite the merge logic.
(and, yes, if you use `sort_indices`, you'll need to call `take`
afterwards; but that's much more efficient that sorting the table
directly, which requires shuffling entire rows when you need to swap
values)
Regards
Antoine.