Thank you, Jon! ________________________________ From: Jonathan Haddad <j...@jonhaddad.com> Sent: Thursday, August 9, 2018 7:29:24 PM To: user Subject: Re: Compression Tuning Tutorial
There's a discussion about direct I/O here you might find interesting: https://issues.apache.org/jira/browse/CASSANDRA-14466 I suspect the main reason is that O_DIRECT wasn't added till Java 10, and while it could be used with some workarounds, there's a lot of entropy around changing something like this. It's not a trivial task to do it right, and mixing has some really nasty issues. At least it means there's lots of room for improvement though :) On Thu, Aug 9, 2018 at 5:36 AM Kyrylo Lebediev <kyrylo_lebed...@epam.com.invalid> wrote: Thank you Jon, great article as usually! One topic that was discussed in the article is filesystem cache which is traditionally leveraged for data caching in Cassandra (with row-caching disabled by default). IIRC mmap() is used. Some RDBMS and NoSQL DB's as well use direct I/O + async I/O + maintain own, not kernel-managed, DB Cache thus improving overall performance. As Cassandra is designed to be a DB with low response time, this approach with DIO/AIO/DB Cache seems to be a really useful feature. Just out of curiosity, are there reasons why this advanced IO stack wasn't implemented, except lack of resources to do this? Regards, Kyrill ________________________________ From: Eric Plowe <eric.pl...@gmail.com<mailto:eric.pl...@gmail.com>> Sent: Wednesday, August 8, 2018 9:39:44 PM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Compression Tuning Tutorial Great post, Jonathan! Thank you very much. ~Eric On Wed, Aug 8, 2018 at 2:34 PM Jonathan Haddad <j...@jonhaddad.com<mailto:j...@jonhaddad.com>> wrote: Hey folks, We've noticed a lot over the years that people create tables usually leaving the default compression parameters, and have spent a lot of time helping teams figure out the right settings for their cluster based on their workload. I finally managed to write some thoughts down along with a high level breakdown of how the internals function that should help people pick better settings for their cluster. This post focuses on a mixed 50:50 read:write workload, but the same conclusions are drawn from a read heavy workload. Hopefully this helps some folks get better performance / save some money on hardware! http://thelastpickle.com/blog/2018/08/08/compression_performance.html -- Jon Haddad Principal Consultant, The Last Pickle -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade