Hey folks, We've noticed a lot over the years that people create tables usually leaving the default compression parameters, and have spent a lot of time helping teams figure out the right settings for their cluster based on their workload. I finally managed to write some thoughts down along with a high level breakdown of how the internals function that should help people pick better settings for their cluster.
This post focuses on a mixed 50:50 read:write workload, but the same conclusions are drawn from a read heavy workload. Hopefully this helps some folks get better performance / save some money on hardware! http://thelastpickle.com/blog/2018/08/08/compression_performance.html -- Jon Haddad Principal Consultant, The Last Pickle