Hello,

I have a set of integer tuples that need to be collected and sorted at a
coordinator. Here is an example with tuples of length 2:

[(1, 10),
 (1, 15),
 (2, 10),
 (2, 15)]

I am considering storing each column in an Arrow array, e.g., [1, 1, 2, 2]
and [10, 15, 10, 15], and have the Arrow arrays grouped in a Record Batch.
Then I would serialize, transfer, and deserialize each record batch. The
coordinator would collect all the record batches and concatenate them.
Finally, the coordinator needs to sort the tuples by value in the
sequential order of the columns, e.g., (1, 10), (1, 15), (2, 10).

Could I accomplish the sort using the Arrow API? I looked at sort_indices
but it does not work on record batches. With a set of sort indices for each
array, sorting the tuples does not seem to be straightforward, right?

Thanks!
Rares

Reply via email to