Thank you, Jon!

________________________________
From: Jonathan Haddad <j...@jonhaddad.com>
Sent: Thursday, August 9, 2018 7:29:24 PM
To: user
Subject: Re: Compression Tuning Tutorial

There's a discussion about direct I/O here you might find interesting: 
https://issues.apache.org/jira/browse/CASSANDRA-14466

I suspect the main reason is that O_DIRECT wasn't added till Java 10, and while 
it could be used with some workarounds, there's a lot of entropy around 
changing something like this.  It's not a trivial task to do it right, and 
mixing has some really nasty issues.

At least it means there's lots of room for improvement though :)


On Thu, Aug 9, 2018 at 5:36 AM Kyrylo Lebediev 
<kyrylo_lebed...@epam.com.invalid> wrote:

Thank you Jon, great article as usually!


One topic that was discussed in the article is filesystem cache which is 
traditionally leveraged for data caching in Cassandra (with row-caching 
disabled by default).

IIRC mmap() is used.

Some RDBMS and NoSQL DB's as well use direct I/O + async I/O + maintain own, 
not kernel-managed, DB Cache thus improving overall performance.

As Cassandra is designed to be a DB with low response time, this approach with 
DIO/AIO/DB Cache seems to be a really useful feature.

Just out of curiosity, are there reasons why this advanced IO stack wasn't 
implemented, except lack of resources to do this?


Regards,

Kyrill

________________________________
From: Eric Plowe <eric.pl...@gmail.com<mailto:eric.pl...@gmail.com>>
Sent: Wednesday, August 8, 2018 9:39:44 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Compression Tuning Tutorial

Great post, Jonathan! Thank you very much.

~Eric

On Wed, Aug 8, 2018 at 2:34 PM Jonathan Haddad 
<j...@jonhaddad.com<mailto:j...@jonhaddad.com>> wrote:
Hey folks,

We've noticed a lot over the years that people create tables usually leaving 
the default compression parameters, and have spent a lot of time helping teams 
figure out the right settings for their cluster based on their workload.  I 
finally managed to write some thoughts down along with a high level breakdown 
of how the internals function that should help people pick better settings for 
their cluster.

This post focuses on a mixed 50:50 read:write workload, but the same 
conclusions are drawn from a read heavy workload.  Hopefully this helps some 
folks get better performance / save some money on hardware!

http://thelastpickle.com/blog/2018/08/08/compression_performance.html


--
Jon Haddad
Principal Consultant, The Last Pickle


--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Reply via email to