Hi Niranda, Le 31/08/2021 à 18:09, Niranda Perera a écrit :
Hi all, Is there an efficient way to merge a set of sorted tables/ record batches (like in the merge step of a merge sort)? Ex: Simplest case would be to merge these 2 tables, based on A
The formatting of your example looks broken, so I don't really know what the two tables are exactly :-)
Any thoughts on these approaches? Is there a better way to do this?
Merging two sorted sequences is a O(N) operation, it doesn't need a min-heap (which would make your algorithm O(N log N)).
You can take a look at the sample implementations in https://en.cppreference.com/w/cpp/algorithm/merge
If you have more than 2 tables, you can merge them by pairs, recursively, like is already done for the C++ sorting kernel on chunked arrays:
https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/vector_sort.cc#L796 Regards Antoine.