Hi Niranda,

Le 31/08/2021 à 18:09, Niranda Perera a écrit :
Hi all,

Is there an efficient way to merge a set of sorted tables/ record batches
(like in the merge step of a merge sort)?
Ex: Simplest case would be to merge these 2 tables, based on A

The formatting of your example looks broken, so I don't really know what the two tables are exactly :-)

Any thoughts on these approaches? Is there a better way to do this?

Merging two sorted sequences is a O(N) operation, it doesn't need a min-heap (which would make your algorithm O(N log N)).

You can take a look at the sample implementations in https://en.cppreference.com/w/cpp/algorithm/merge

If you have more than 2 tables, you can merge them by pairs, recursively, like is already done for the C++ sorting kernel on chunked arrays:
https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/vector_sort.cc#L796

Regards

Antoine.

Reply via email to