Hello, I am trying to understand how in-memory size is changing in these situations. Specifically, why is in-memory size much higher for avro and parquet? Are there any optimizations necessary to reduce this?
Used cacheTable on each of these: AVRO File (600kb) - In-memory size was 12mb Parquet File (600kb) - In-memory size was 12mb CSV File (3mb, was the same file as above) - In-memory size was 600Kb Because of this, we need a cluster with a much bigger memory if we were to cache the avro files. Thanks for your help. Prit