Hi Jacek,
I recall an issue with similar concern [1] that I was trying to answer,
hope that can help.
Besides, if you do the join in parallel, e.g. by directly calling acero API
in C++ and the source node is parallel, there is another level of
uncertainty of the order of output rows, depending on
Hello everyone,
We've identified an issue with Acero's hash join/aggregation, which is
currently limited to processing only up to 4GB data due to the use of
`uint32_t` for row offsets. This limitation not only impacts our ability to
handle large datasets but also makes typical solutions like split
ithub.com/apache/arrow/issues/43495
> PR: https://github.com/apache/arrow/pull/43389
>
> On Thu, Aug 1, 2024 at 4:06 AM Ruoxi Sun wrote:
>
>> Hello everyone,
>>
>> We've identified an issue with Acero's hash join/aggregation, which is
>> currently limi