Re: How to measure the write amplification of C*?

Jeff Jirsa Thu, 10 Mar 2016 10:02:41 -0800

A bit of Splunk-fu probably works for this – you’ll have different line entries 
for memtable flushes vs compaction output. Comparing the two will give you a 
general idea of compaction amplification.

From:  Dikang Gu
Reply-To:  "user@cassandra.apache.org"
Date:  Thursday, March 10, 2016 at 9:44 AM
To:  cassandra, "mkenn...@datastax.com"
Subject:  Re: How to measure the write amplification of C*?

Hi Matt, 

Thanks for the detailed explanation! Yes, this is exactly what I'm looking for, 
"write amplification = data written to flash/data written by the host".

We are heavily using the LCS in production, so I'd like to figure out the 
amplification caused by that and see what we can do to optimize it. I have the 
metrics of "data written to flash", and I'm wondering is there an easy way to 
get the "data written by the host" on each C* node?

Thanks

On Thu, Mar 10, 2016 at 8:48 AM, Matt Kennedy <mkenn...@datastax.com> wrote:
TL;DR - Cassandra actually causes a ton of write amplification but it doesn't 
freaking matter any more. Read on for details...

That slide deck does have a lot of very good information on it, but 
unfortunately I think it has led to a fundamental misunderstanding about 
Cassandra and write amplification. In particular, slide 51 vastly 
oversimplifies the situation.

The wikipedia definition of write amplification looks at this from the 
perspective of the SSD controller:
https://en.wikipedia.org/wiki/Write_amplification#Calculating_the_value

In short, write amplification = data written to flash/data written by the host

So, if I write 1MB in my application, but the SSD has to write my 1MB, plus 
rearrange another 1MB of data in order to make room for it, then I've written a 
total of 2MB and my write amplification is 2x.

In other words, it is measuring how much extra the SSD controller has to write 
in order to do its own housekeeping.

However, the wikipedia definition is a bit more constrained than how the term 
is used in the storage industry. The whole point of looking at write 
amplification is to understand the impact that a particular workload is going 
to have on the underlying NAND by virtue of the data written. So a definition 
of write amplification that is a little more relevant to the context of 
Cassandra is to consider this:

write amplification = data written to flash/data written to the database

So, while the fact that we only sequentially write large immutable SSTables 
does in fact mean that controller-level write amplification is near zero, 
Compaction comes along and completely destroys that tidy little story. Think 
about it, every time a compaction re-writes data that has already been written, 
we are creating a lot of application-level write amplification. Different 
compaction strategies and the workload itself impact what the real 
application-level write amp is, but generally speaking, LCS is the worst, 
followed by STCS and DTCS will cause the least write-amp. To measure this, you 
can usually use smartctl (may be another mechanism depending on SSD 
manufacturer) to get the physical bytes written to your SSDs and divide that by 
the data that you've actually logically written to Cassandra. I've measured 
(more than two years ago) LCS write amp as high as 50x on some workloads, which 
is significantly higher than the typical controller level write amp on a b-tree 
style update-in-place data store. Also note that the new storage engine in 
general reduces a lot of inefficiency in the Cassandra storage engine therefore 
reducing the impact of write amp due to compactions.

However, if you're a person that understands SSDs, at this point you're 
wondering why we aren't burning out SSDs right and left. The reality is that 
general SSD endurance has gotten so good, that all this write amp isn't really 
a problem any more. If you're curious to read more about that, I recommend you 
start here:

http://hothardware.com/news/google-data-center-ssd-research-report-offers-surprising-results-slc-not-more-reliable-than-mlc-flash

and the paper that article mentions:
http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/23105-fast16-papers-schroeder.pdf

Hope this helps.

Matt Kennedy

On Thu, Mar 10, 2016 at 7:05 AM, Paulo Motta <pauloricard...@gmail.com> wrote:
This is a good source on Cassandra + write amplification: 
http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives

2016-03-10 9:57 GMT-03:00 Benjamin Lerer <benjamin.le...@datastax.com>:
Cassandra should not cause any write amplification. Write amplification
appends only when you updates data on SSDs. Cassandra does not update any
data in place. Data can be rewritten during compaction but it is never
updated.

Benjamin

On Thu, Mar 10, 2016 at 12:42 PM, Alain RODRIGUEZ <arodr...@gmail.com>
wrote:

> Hi Dikang,
>
> I am not sure about what you call "amplification", but as sizes highly
> depends on the structure I think I would probably give it a try using CCM (
> https://github.com/pcmanus/ccm) or some test cluster with 'production
> like'
> setting and schema. You can write a row, flush it and see how big is the
> data cluster-wide / per node.
>
> Hope this will be of some help.
>
> C*heers,
> -----------------------
> Alain Rodriguez - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2016-03-10 7:18 GMT+01:00 Dikang Gu <dikan...@gmail.com>:
>
> > Hello there,
> >
> > I'm wondering is there a good way to measure the write amplification of
> > Cassandra?
> >
> > I'm thinking it could be calculated by (size of mutations written to the
> > node)/(number of bytes written to the disk).
> >
> > Do we already have the metrics of "size of mutations written to the
> node"?
> > I did not find it in jmx metrics.
> >
> > Thanks
> >
> > --
> > Dikang
> >
> >
>

-- 
Dikang

smime.p7s
Description: S/MIME cryptographic signature

Re: How to measure the write amplification of C*?

Reply via email to