Hello everyone,
We've identified an issue with Acero's hash join/aggregation, which is
currently limited to processing only up to 4GB data due to the use of
`uint32_t` for row offsets. This limitation not only impacts our ability to
handle large datasets but also makes typical solutions like split
Thanks for driving this forward. I didn't see the links in my email client
so I'm adding those below in case helps others:
Issue: https://github.com/apache/arrow/issues/43495
PR: https://github.com/apache/arrow/pull/43389
On Thu, Aug 1, 2024 at 4:06 AM Ruoxi Sun wrote:
> Hello everyone,
>
> We'
Hi!
Could someone tell me if this is a feature or a bug (pyarrow 17.0.0).
When I have a dictionary column:
In [15]: a1
Out[15]:
pyarrow.Table
a: dictionary
a: [ -- dictionary:
[1] -- indices:
[0]]
I store it in parquet file:
In [16]: pq.write_table(a1, 'dict.parquet')
And read it back:
a
It is my bad for forgetting to add the links.
Much appreciated!
*Regards,*
*Rossi SUN*
Bryce Mecum 于2024年8月2日 周五01:06写道:
> Thanks for driving this forward. I didn't see the links in my email client
> so I'm adding those below in case helps others:
>
> Issue: https://github.com/apache/arrow/iss