On Tue, 31 Aug 2021 17:47:02 -0400 Niranda Perera <niranda.per...@gmail.com> wrote: > @antoine thanks for the heads up. > This was the example. I want to merge this table into a final table > +-----------+--+-----------+ > | Table 0 | | Table 1 | > +---+---+---+ +---+---+---+ > | | A | B | | | A | B | > +---+---+---+ +---+---+---+ > | 0 | 3 | x | | 0 | 1 | x | > +---+---+---+ +---+---+---+ > | 1 | 3 | x | | 1 | 2 | x | > +---+---+---+ +---+---+---+ > | 2 | 4 | x | | 2 | 2 | x | > +---+---+---+ +---+---+---+ > | 3 | 5 | x | | 3 | 5 | x | > +---+---+---+--+---+---+---+ > > I agree. For the 2 tables case, we don't need a min-heap. But I think the > complexity of the min-heap approach would be O(N log K) for K tables.
The min-heap approach you're proposing looks like a regular heap sort, and so would be O(N log N) (it doesn't take advantage of the existing order). Besides, heap sort is not stable. The pairwise merge approach should O(N log K), however. (of course, if N is small-ish this may not be a problem for you) > I looked at the chunked array index sort impl. If I want to create a final > table, I'd still have to use `sort_index` followed by `take`, wouldn't it? If you use `sort_indices`, you'll have the entire sort. If you want to take advantage of each chunk being ordered, you should extract or rewrite the merge logic. (and, yes, if you use `sort_indices`, you'll need to call `take` afterwards; but that's much more efficient that sorting the table directly, which requires shuffling entire rows when you need to swap values) Regards Antoine.