On Tue, 31 Aug 2021 17:47:02 -0400
Niranda Perera <niranda.per...@gmail.com> wrote:
> @antoine thanks for the heads up.
> This was the example. I want to merge this table into a final table
> +-----------+--+-----------+
> |  Table 0  |  |  Table 1  |
> +---+---+---+  +---+---+---+
> |   | A | B |  |   | A | B |
> +---+---+---+  +---+---+---+
> | 0 | 3 | x |  | 0 | 1 | x |
> +---+---+---+  +---+---+---+
> | 1 | 3 | x |  | 1 | 2 | x |
> +---+---+---+  +---+---+---+
> | 2 | 4 | x |  | 2 | 2 | x |
> +---+---+---+  +---+---+---+
> | 3 | 5 | x |  | 3 | 5 | x |
> +---+---+---+--+---+---+---+
> 
> I agree. For the 2 tables case, we don't need a min-heap. But I think the
> complexity of the min-heap approach would be O(N log K) for K tables.

The min-heap approach you're proposing looks like a regular heap sort,
and so would be O(N log N) (it doesn't take advantage of the existing
order). Besides, heap sort is not stable.

The pairwise merge approach should O(N log K), however.

(of course, if N is small-ish this may not be a problem for you)

> I looked at the chunked array index sort impl. If I want to create a final
> table, I'd still have to use `sort_index` followed by `take`, wouldn't it?

If you use `sort_indices`, you'll have the entire sort.  If you want to
take advantage of each chunk being ordered, you should extract or
rewrite the merge logic.

(and, yes, if you use `sort_indices`, you'll need to call `take`
afterwards; but that's much more efficient that sorting the table
directly, which requires shuffling entire rows when you need to swap
values)

Regards

Antoine.


Reply via email to