I filed a minor PR [1] to improve the documentation so it's clear what
units are involved as I think the current language is vague.
[1] https://github.com/apache/arrow/pull/40251
On Sun, Feb 25, 2024 at 9:08 PM Kevin Liu wrote:
>
> Hey folks,
>
> I'm working with the PyArrow API for Tables and R
Hi Kevin,
Shoumyo is correct that the chunk size of to_batches is row-based (logical) and
not byte-based (physical), see the example in the documentation [1]. And for
more clarity on the "...depending on the chunk layout of individual columns"
portion, a Table column is a `ChunkedArray`, which
Hi Kevin,
I'm not an Arrow dev so take everything I say with a grain a salt. I just
wanted to point out that the `max_chunksize` appears to refer to the max number
of *rows* per batch rather than the number of *bytes* per batch:
https://github.com/apache/arrow/blob/b8fff043c6cb351b1fad87fa0eeaf