Question about In-Memory size (cache / cacheTable)

Prithish Wed, 26 Oct 2016 22:20:07 -0700

Hello,

I am trying to understand how in-memory size is changing in these
situations. Specifically, why is in-memory size much higher for avro and
parquet? Are there any optimizations necessary to reduce this?


Used cacheTable on each of these:

AVRO File (600kb) - In-memory size was 12mb
Parquet File (600kb) - In-memory size was 12mb
CSV File (3mb, was the same file as above) - In-memory size was 600Kb

Because of this, we need a cluster with a much bigger memory if we were to
cache the avro files.

Thanks for your help.

Prit

Question about In-Memory size (cache / cacheTable)

Reply via email to