Re: How to measure the write amplification of C*?

Matt Kennedy Thu, 10 Mar 2016 09:53:54 -0800

It isn't really the data written by the host that you're concerned with,
it's the data written by your application. I'd start by instrumenting your
application tier to tally up the size of the values that it writes to C*.


However, it may not be extremely useful to have this value. You can't do
much with the information it provides. It is probably a better idea to
track the bytes written to flash for each drive so that you know the
physical endurance of that type of drive given your workload. Unfortunately
the TBW endurance rated for the drive may not be extremely useful given the
difference between the synthetic workload used to create those ratings and
the workload that Cassandra is producing for your particular case. You can
find out more about those here:
https://www.jedec.org/standards-documents/docs/jesd219a


Matt Kennedy

Sr. Product Manager, DSE Core

matt.kenn...@datastax.com | Public Calendar <http://goo.gl/4Ui04Z>

*DataStax Enterprise - the database for cloud applications.*

On Thu, Mar 10, 2016 at 11:44 AM, Dikang Gu <dikan...@gmail.com> wrote:

> Hi Matt,
>
> Thanks for the detailed explanation! Yes, this is exactly what I'm looking
> for, "write amplification = data written to flash/data written by the
> host".
>
> We are heavily using the LCS in production, so I'd like to figure out the
> amplification caused by that and see what we can do to optimize it. I have
> the metrics of "data written to flash", and I'm wondering is there an
> easy way to get the "data written by the host" on each C* node?
>
> Thanks
>
> On Thu, Mar 10, 2016 at 8:48 AM, Matt Kennedy <mkenn...@datastax.com>
> wrote:
>
>> TL;DR - Cassandra actually causes a ton of write amplification but it
>> doesn't freaking matter any more. Read on for details...
>>
>> That slide deck does have a lot of very good information on it, but
>> unfortunately I think it has led to a fundamental misunderstanding about
>> Cassandra and write amplification. In particular, slide 51 vastly
>> oversimplifies the situation.
>>
>> The wikipedia definition of write amplification looks at this from the
>> perspective of the SSD controller:
>> https://en.wikipedia.org/wiki/Write_amplification#Calculating_the_value
>>
>> In short, write amplification = data written to flash/data written by the
>> host
>>
>> So, if I write 1MB in my application, but the SSD has to write my 1MB,
>> plus rearrange another 1MB of data in order to make room for it, then I've
>> written a total of 2MB and my write amplification is 2x.
>>
>> In other words, it is measuring how much extra the SSD controller has to
>> write in order to do its own housekeeping.
>>
>> However, the wikipedia definition is a bit more constrained than how the
>> term is used in the storage industry. The whole point of looking at write
>> amplification is to understand the impact that a particular workload is
>> going to have on the underlying NAND by virtue of the data written. So a
>> definition of write amplification that is a little more relevant to the
>> context of Cassandra is to consider this:
>>
>> write amplification = data written to flash/data written to the database
>>
>> So, while the fact that we only sequentially write large immutable
>> SSTables does in fact mean that controller-level write amplification is
>> near zero, Compaction comes along and completely destroys that tidy little
>> story. Think about it, every time a compaction re-writes data that has
>> already been written, we are creating a lot of application-level write
>> amplification. Different compaction strategies and the workload itself
>> impact what the real application-level write amp is, but generally
>> speaking, LCS is the worst, followed by STCS and DTCS will cause the least
>> write-amp. To measure this, you can usually use smartctl (may be another
>> mechanism depending on SSD manufacturer) to get the physical bytes written
>> to your SSDs and divide that by the data that you've actually logically
>> written to Cassandra. I've measured (more than two years ago) LCS write amp
>> as high as 50x on some workloads, which is significantly higher than the
>> typical controller level write amp on a b-tree style update-in-place data
>> store. Also note that the new storage engine in general reduces a lot of
>> inefficiency in the Cassandra storage engine therefore reducing the impact
>> of write amp due to compactions.
>>
>> However, if you're a person that understands SSDs, at this point you're
>> wondering why we aren't burning out SSDs right and left. The reality is
>> that general SSD endurance has gotten so good, that all this write amp
>> isn't really a problem any more. If you're curious to read more about that,
>> I recommend you start here:
>>
>>
>> http://hothardware.com/news/google-data-center-ssd-research-report-offers-surprising-results-slc-not-more-reliable-than-mlc-flash
>>
>> and the paper that article mentions:
>>
>> http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/23105-fast16-papers-schroeder.pdf
>>
>>
>> Hope this helps.
>>
>>
>> Matt Kennedy
>>
>>
>>
>> On Thu, Mar 10, 2016 at 7:05 AM, Paulo Motta <pauloricard...@gmail.com>
>> wrote:
>>
>>> This is a good source on Cassandra + write amplification:
>>> http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives
>>>
>>> 2016-03-10 9:57 GMT-03:00 Benjamin Lerer <benjamin.le...@datastax.com>:
>>>
>>>> Cassandra should not cause any write amplification. Write amplification
>>>> appends only when you updates data on SSDs. Cassandra does not update
>>>> any
>>>> data in place. Data can be rewritten during compaction but it is never
>>>> updated.
>>>>
>>>> Benjamin
>>>>
>>>> On Thu, Mar 10, 2016 at 12:42 PM, Alain RODRIGUEZ <arodr...@gmail.com>
>>>> wrote:
>>>>
>>>> > Hi Dikang,
>>>> >
>>>> > I am not sure about what you call "amplification", but as sizes highly
>>>> > depends on the structure I think I would probably give it a try using
>>>> CCM (
>>>> > https://github.com/pcmanus/ccm) or some test cluster with 'production
>>>> > like'
>>>> > setting and schema. You can write a row, flush it and see how big is
>>>> the
>>>> > data cluster-wide / per node.
>>>> >
>>>> > Hope this will be of some help.
>>>> >
>>>> > C*heers,
>>>> > -----------------------
>>>> > Alain Rodriguez - al...@thelastpickle.com
>>>> > France
>>>> >
>>>> > The Last Pickle - Apache Cassandra Consulting
>>>> > http://www.thelastpickle.com
>>>> >
>>>> > 2016-03-10 7:18 GMT+01:00 Dikang Gu <dikan...@gmail.com>:
>>>> >
>>>> > > Hello there,
>>>> > >
>>>> > > I'm wondering is there a good way to measure the write
>>>> amplification of
>>>> > > Cassandra?
>>>> > >
>>>> > > I'm thinking it could be calculated by (size of mutations written
>>>> to the
>>>> > > node)/(number of bytes written to the disk).
>>>> > >
>>>> > > Do we already have the metrics of "size of mutations written to the
>>>> > node"?
>>>> > > I did not find it in jmx metrics.
>>>> > >
>>>> > > Thanks
>>>> > >
>>>> > > --
>>>> > > Dikang
>>>> > >
>>>> > >
>>>> >
>>>>
>>>
>>>
>>
>
>
> --
> Dikang
>
>

Re: How to measure the write amplification of C*?

Reply via email to