On Tue, 31 Aug 2021 21:46:23 -0700
Rares Vernica <rvern...@gmail.com> wrote:
> 
> I'm storing RecordBatch objects in a local cache to improve performance. I
> want to keep track of the memory usage to stay within bounds. The arrays
> stored in the batch are not nested.
> 
> The best way I came up to compute the size of a RecordBatch is:
> 
>             size_t arrowSize = 0;
>             for (auto i = 0; i < arrowBatch->num_columns(); ++i) {
>                 auto column = arrowBatch->column_data(i);
>                 if (column->buffers[0])
>                     arrowSize += column->buffers[0]->size();
>                 if (column->buffers[1])
>                     arrowSize += column->buffers[1]->size();
>             }
> 
> Does this look reasonable? I guess we are over estimating a bit due to the
> buffer alignment but that should be fine.

Probably, but you should iterate over all buffers instead of
selecting just buffers 0 and 1 (what if you have a string column?).

So basically:

```
size_t arrowSize = 0;
for (const auto& column : batch->columns()) {
  for (const auto& buffer : column->data()->buffers) {
    if (buffer)
      arrowSize += buffer->size();
  }
}
```

Regards

Antoine.


Reply via email to