According to Wikipedia there is a min-heap approach that is O(N log k) not sure if this matches with Niranda's proposal [1]. On the surface the analysis make sense to me but I could be missing something.
[1] https://en.m.wikipedia.org/wiki/K-way_merge_algorithm On Tuesday, August 31, 2021, Antoine Pitrou <anto...@python.org> wrote: > On Tue, 31 Aug 2021 17:47:02 -0400 > Niranda Perera <niranda.per...@gmail.com> wrote: > > @antoine thanks for the heads up. > > This was the example. I want to merge this table into a final table > > +-----------+--+-----------+ > > | Table 0 | | Table 1 | > > +---+---+---+ +---+---+---+ > > | | A | B | | | A | B | > > +---+---+---+ +---+---+---+ > > | 0 | 3 | x | | 0 | 1 | x | > > +---+---+---+ +---+---+---+ > > | 1 | 3 | x | | 1 | 2 | x | > > +---+---+---+ +---+---+---+ > > | 2 | 4 | x | | 2 | 2 | x | > > +---+---+---+ +---+---+---+ > > | 3 | 5 | x | | 3 | 5 | x | > > +---+---+---+--+---+---+---+ > > > > I agree. For the 2 tables case, we don't need a min-heap. But I think the > > complexity of the min-heap approach would be O(N log K) for K tables. > > The min-heap approach you're proposing looks like a regular heap sort, > and so would be O(N log N) (it doesn't take advantage of the existing > order). Besides, heap sort is not stable. > > The pairwise merge approach should O(N log K), however. > > (of course, if N is small-ish this may not be a problem for you) > > > I looked at the chunked array index sort impl. If I want to create a > final > > table, I'd still have to use `sort_index` followed by `take`, wouldn't > it? > > If you use `sort_indices`, you'll have the entire sort. If you want to > take advantage of each chunk being ordered, you should extract or > rewrite the merge logic. > > (and, yes, if you use `sort_indices`, you'll need to call `take` > afterwards; but that's much more efficient that sorting the table > directly, which requires shuffling entire rows when you need to swap > values) > > Regards > > Antoine. > > >