Re: rows reshuffled on join

2024-04-16 Thread Ruoxi Sun
Hi Jacek, I recall an issue with similar concern [1] that I was trying to answer, hope that can help. Besides, if you do the join in parallel, e.g. by directly calling acero API in C++ and the source node is parallel, there is another level of uncertainty of the order of output rows, depending on

[DISCUSS][Acero] Upgrading to 64-bit row offsets in row table

2024-08-01 Thread Ruoxi Sun
Hello everyone, We've identified an issue with Acero's hash join/aggregation, which is currently limited to processing only up to 4GB data due to the use of `uint32_t` for row offsets. This limitation not only impacts our ability to handle large datasets but also makes typical solutions like split

Re: [DISCUSS][Acero] Upgrading to 64-bit row offsets in row table

2024-08-01 Thread Ruoxi Sun
ithub.com/apache/arrow/issues/43495 > PR: https://github.com/apache/arrow/pull/43389 > > On Thu, Aug 1, 2024 at 4:06 AM Ruoxi Sun wrote: > >> Hello everyone, >> >> We've identified an issue with Acero's hash join/aggregation, which is >> currently limi