TL;DR - Cassandra actually causes a ton of write amplification but it
doesn't freaking matter any more. Read on for details...

That slide deck does have a lot of very good information on it, but
unfortunately I think it has led to a fundamental misunderstanding about
Cassandra and write amplification. In particular, slide 51 vastly
oversimplifies the situation.

The wikipedia definition of write amplification looks at this from the
perspective of the SSD controller:
https://en.wikipedia.org/wiki/Write_amplification#Calculating_the_value

In short, write amplification = data written to flash/data written by the
host

So, if I write 1MB in my application, but the SSD has to write my 1MB, plus
rearrange another 1MB of data in order to make room for it, then I've
written a total of 2MB and my write amplification is 2x.

In other words, it is measuring how much extra the SSD controller has to
write in order to do its own housekeeping.

However, the wikipedia definition is a bit more constrained than how the
term is used in the storage industry. The whole point of looking at write
amplification is to understand the impact that a particular workload is
going to have on the underlying NAND by virtue of the data written. So a
definition of write amplification that is a little more relevant to the
context of Cassandra is to consider this:

write amplification = data written to flash/data written to the database

So, while the fact that we only sequentially write large immutable SSTables
does in fact mean that controller-level write amplification is near zero,
Compaction comes along and completely destroys that tidy little story.
Think about it, every time a compaction re-writes data that has already
been written, we are creating a lot of application-level write
amplification. Different compaction strategies and the workload itself
impact what the real application-level write amp is, but generally
speaking, LCS is the worst, followed by STCS and DTCS will cause the least
write-amp. To measure this, you can usually use smartctl (may be another
mechanism depending on SSD manufacturer) to get the physical bytes written
to your SSDs and divide that by the data that you've actually logically
written to Cassandra. I've measured (more than two years ago) LCS write amp
as high as 50x on some workloads, which is significantly higher than the
typical controller level write amp on a b-tree style update-in-place data
store. Also note that the new storage engine in general reduces a lot of
inefficiency in the Cassandra storage engine therefore reducing the impact
of write amp due to compactions.

However, if you're a person that understands SSDs, at this point you're
wondering why we aren't burning out SSDs right and left. The reality is
that general SSD endurance has gotten so good, that all this write amp
isn't really a problem any more. If you're curious to read more about that,
I recommend you start here:

http://hothardware.com/news/google-data-center-ssd-research-report-offers-surprising-results-slc-not-more-reliable-than-mlc-flash

and the paper that article mentions:
http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/23105-fast16-papers-schroeder.pdf


Hope this helps.


Matt Kennedy



On Thu, Mar 10, 2016 at 7:05 AM, Paulo Motta <pauloricard...@gmail.com>
wrote:

> This is a good source on Cassandra + write amplification:
> http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives
>
> 2016-03-10 9:57 GMT-03:00 Benjamin Lerer <benjamin.le...@datastax.com>:
>
>> Cassandra should not cause any write amplification. Write amplification
>> appends only when you updates data on SSDs. Cassandra does not update any
>> data in place. Data can be rewritten during compaction but it is never
>> updated.
>>
>> Benjamin
>>
>> On Thu, Mar 10, 2016 at 12:42 PM, Alain RODRIGUEZ <arodr...@gmail.com>
>> wrote:
>>
>> > Hi Dikang,
>> >
>> > I am not sure about what you call "amplification", but as sizes highly
>> > depends on the structure I think I would probably give it a try using
>> CCM (
>> > https://github.com/pcmanus/ccm) or some test cluster with 'production
>> > like'
>> > setting and schema. You can write a row, flush it and see how big is the
>> > data cluster-wide / per node.
>> >
>> > Hope this will be of some help.
>> >
>> > C*heers,
>> > -----------------------
>> > Alain Rodriguez - al...@thelastpickle.com
>> > France
>> >
>> > The Last Pickle - Apache Cassandra Consulting
>> > http://www.thelastpickle.com
>> >
>> > 2016-03-10 7:18 GMT+01:00 Dikang Gu <dikan...@gmail.com>:
>> >
>> > > Hello there,
>> > >
>> > > I'm wondering is there a good way to measure the write amplification
>> of
>> > > Cassandra?
>> > >
>> > > I'm thinking it could be calculated by (size of mutations written to
>> the
>> > > node)/(number of bytes written to the disk).
>> > >
>> > > Do we already have the metrics of "size of mutations written to the
>> > node"?
>> > > I did not find it in jmx metrics.
>> > >
>> > > Thanks
>> > >
>> > > --
>> > > Dikang
>> > >
>> > >
>> >
>>
>
>

Reply via email to