Compaction logs show the number of bytes written and the level written to. Base write load = table flushed to L0. Write amplification = sum of all compactions written to disk for the table.
On Thu, Mar 10, 2016 at 9:44 AM, Dikang Gu <dikan...@gmail.com> wrote: > Hi Matt, > > Thanks for the detailed explanation! Yes, this is exactly what I'm looking > for, "write amplification = data written to flash/data written by the > host". > > We are heavily using the LCS in production, so I'd like to figure out the > amplification caused by that and see what we can do to optimize it. I have > the metrics of "data written to flash", and I'm wondering is there an > easy way to get the "data written by the host" on each C* node? > > Thanks > > On Thu, Mar 10, 2016 at 8:48 AM, Matt Kennedy <mkenn...@datastax.com> > wrote: > >> TL;DR - Cassandra actually causes a ton of write amplification but it >> doesn't freaking matter any more. Read on for details... >> >> That slide deck does have a lot of very good information on it, but >> unfortunately I think it has led to a fundamental misunderstanding about >> Cassandra and write amplification. In particular, slide 51 vastly >> oversimplifies the situation. >> >> The wikipedia definition of write amplification looks at this from the >> perspective of the SSD controller: >> https://en.wikipedia.org/wiki/Write_amplification#Calculating_the_value >> >> In short, write amplification = data written to flash/data written by the >> host >> >> So, if I write 1MB in my application, but the SSD has to write my 1MB, >> plus rearrange another 1MB of data in order to make room for it, then I've >> written a total of 2MB and my write amplification is 2x. >> >> In other words, it is measuring how much extra the SSD controller has to >> write in order to do its own housekeeping. >> >> However, the wikipedia definition is a bit more constrained than how the >> term is used in the storage industry. The whole point of looking at write >> amplification is to understand the impact that a particular workload is >> going to have on the underlying NAND by virtue of the data written. So a >> definition of write amplification that is a little more relevant to the >> context of Cassandra is to consider this: >> >> write amplification = data written to flash/data written to the database >> >> So, while the fact that we only sequentially write large immutable >> SSTables does in fact mean that controller-level write amplification is >> near zero, Compaction comes along and completely destroys that tidy little >> story. Think about it, every time a compaction re-writes data that has >> already been written, we are creating a lot of application-level write >> amplification. Different compaction strategies and the workload itself >> impact what the real application-level write amp is, but generally >> speaking, LCS is the worst, followed by STCS and DTCS will cause the least >> write-amp. To measure this, you can usually use smartctl (may be another >> mechanism depending on SSD manufacturer) to get the physical bytes written >> to your SSDs and divide that by the data that you've actually logically >> written to Cassandra. I've measured (more than two years ago) LCS write amp >> as high as 50x on some workloads, which is significantly higher than the >> typical controller level write amp on a b-tree style update-in-place data >> store. Also note that the new storage engine in general reduces a lot of >> inefficiency in the Cassandra storage engine therefore reducing the impact >> of write amp due to compactions. >> >> However, if you're a person that understands SSDs, at this point you're >> wondering why we aren't burning out SSDs right and left. The reality is >> that general SSD endurance has gotten so good, that all this write amp >> isn't really a problem any more. If you're curious to read more about that, >> I recommend you start here: >> >> >> http://hothardware.com/news/google-data-center-ssd-research-report-offers-surprising-results-slc-not-more-reliable-than-mlc-flash >> >> and the paper that article mentions: >> >> http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/23105-fast16-papers-schroeder.pdf >> >> >> Hope this helps. >> >> >> Matt Kennedy >> >> >> >> On Thu, Mar 10, 2016 at 7:05 AM, Paulo Motta <pauloricard...@gmail.com> >> wrote: >> >>> This is a good source on Cassandra + write amplification: >>> http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives >>> >>> 2016-03-10 9:57 GMT-03:00 Benjamin Lerer <benjamin.le...@datastax.com>: >>> >>>> Cassandra should not cause any write amplification. Write amplification >>>> appends only when you updates data on SSDs. Cassandra does not update >>>> any >>>> data in place. Data can be rewritten during compaction but it is never >>>> updated. >>>> >>>> Benjamin >>>> >>>> On Thu, Mar 10, 2016 at 12:42 PM, Alain RODRIGUEZ <arodr...@gmail.com> >>>> wrote: >>>> >>>> > Hi Dikang, >>>> > >>>> > I am not sure about what you call "amplification", but as sizes highly >>>> > depends on the structure I think I would probably give it a try using >>>> CCM ( >>>> > https://github.com/pcmanus/ccm) or some test cluster with 'production >>>> > like' >>>> > setting and schema. You can write a row, flush it and see how big is >>>> the >>>> > data cluster-wide / per node. >>>> > >>>> > Hope this will be of some help. >>>> > >>>> > C*heers, >>>> > ----------------------- >>>> > Alain Rodriguez - al...@thelastpickle.com >>>> > France >>>> > >>>> > The Last Pickle - Apache Cassandra Consulting >>>> > http://www.thelastpickle.com >>>> > >>>> > 2016-03-10 7:18 GMT+01:00 Dikang Gu <dikan...@gmail.com>: >>>> > >>>> > > Hello there, >>>> > > >>>> > > I'm wondering is there a good way to measure the write >>>> amplification of >>>> > > Cassandra? >>>> > > >>>> > > I'm thinking it could be calculated by (size of mutations written >>>> to the >>>> > > node)/(number of bytes written to the disk). >>>> > > >>>> > > Do we already have the metrics of "size of mutations written to the >>>> > node"? >>>> > > I did not find it in jmx metrics. >>>> > > >>>> > > Thanks >>>> > > >>>> > > -- >>>> > > Dikang >>>> > > >>>> > > >>>> > >>>> >>> >>> >> > > > -- > Dikang > >