> * There seems to be some interaction between
> `parquet::internal::RecordReader` and `arrow::PoolBuffer` or
> `arrow::DefaultMemoryPool`. `RecordReader` request an allocation to hold the
> entire column in memory without compression/encoding even though Arrow
> supports dictionary encoding (a
Following up on what I have found with Uwe's advice and poking around the
code base.
* `columns=` helped but it was because forced me to realize I did not need
all of the columns at once every time. No particular column was
significantly worse in memory usage.
* There seems to be some interaction
Uwe,
I'll try pinpointing things further with `columns=` and try to reproduce
what I find with data I can share.
Thanks for the pointer.
-Bryant
On Wed, Apr 25, 2018 at 2:10 PM Uwe L. Korn wrote:
> No, there is no need to pass any options on reading. Sometimes they are
> beneficial depending
No, there is no need to pass any options on reading. Sometimes they are
beneficial depending on what you want to achieve but defaults are ok, too.
I'm not sure if you're able to post an example but it would be nice if you
could post the resulting Arrow schema from the table. It might be related
Uwe,
I am not. Should I be? I forgot to mention earlier that the Parquet file
came from Spark/PySpark.
On Wed, Apr 25, 2018 at 1:32 PM Uwe L. Korn wrote:
> Hello Bryant,
>
> are you using any options on `pyarrow.parquet.read_table` or a possible
> `to_pandas` afterwards?
>
> Uwe
>
> On Wed, Apr
Hello Bryant,
are you using any options on `pyarrow.parquet.read_table` or a possible
`to_pandas` afterwards?
Uwe
On Wed, Apr 25, 2018, at 7:27 PM, Bryant Menn wrote:
> I tried reading a Parquet file (<200MB, lots of text with snappy) using
> read_table and saw the memory usage peak over 8GB be
I tried reading a Parquet file (<200MB, lots of text with snappy) using
read_table and saw the memory usage peak over 8GB before settling back down
to ~200MB. This surprised me as I was expecting to be able to handle a
Parquet file of this size with much less RAM (doing some processing with
smaller