Peak memory usage for pyarrow.parquet.read_table

Bryant Menn Wed, 25 Apr 2018 10:28:14 -0700

I tried reading a Parquet file (<200MB, lots of text with snappy) using
read_table and saw the memory usage peak over 8GB before settling back down
to ~200MB. This surprised me as I was expecting to be able to handle a
Parquet file of this size with much less RAM (doing some processing with
smaller VMs).


I am not sure if this expected, but I thought I might check with everyone
here and learn something new. Poking around it seems to be related with
ParquetReader.read_all?

Thanks in advance,
Bryant

Peak memory usage for pyarrow.parquet.read_table

Reply via email to