Hello,

I am trying to understand how in-memory size is changing in these
situations. Specifically, why is in-memory size much higher for avro and
parquet? Are there any optimizations necessary to reduce this?

Used cacheTable on each of these:

AVRO File (600kb) - In-memory size was 12mb
Parquet File (600kb) - In-memory size was 12mb
CSV File (3mb, was the same file as above) - In-memory size was 600Kb

Because of this, we need a cluster with a much bigger memory if we were to
cache the avro files.

Thanks for your help.

Prit

Reply via email to