The doc does say this: "A log-structured engine that avoids overwrites and uses sequential IO to update data is essential for writing to solid-state disks (SSD) and hard disks (HDD) On HDD, writing randomly involves a higher number of seek operations than sequential writing. The seek penalty incurred can be substantial. Using sequential IO (thereby avoiding write amplification <http://en.wikipedia.org/wiki/Write_amplification> and disk failure), Cassandra accommodates inexpensive, consumer SSDs extremely well."
I presume that write amplification argues for placing the commit log on a separate SSD device. That should probably be mentioned. -- Jack Krupansky On Thu, Mar 10, 2016 at 12:52 PM, Matt Kennedy <matt.kenn...@datastax.com> wrote: > It isn't really the data written by the host that you're concerned with, > it's the data written by your application. I'd start by instrumenting your > application tier to tally up the size of the values that it writes to C*. > > However, it may not be extremely useful to have this value. You can't do > much with the information it provides. It is probably a better idea to > track the bytes written to flash for each drive so that you know the > physical endurance of that type of drive given your workload. Unfortunately > the TBW endurance rated for the drive may not be extremely useful given the > difference between the synthetic workload used to create those ratings and > the workload that Cassandra is producing for your particular case. You can > find out more about those here: > https://www.jedec.org/standards-documents/docs/jesd219a > > > Matt Kennedy > > Sr. Product Manager, DSE Core > > matt.kenn...@datastax.com | Public Calendar <http://goo.gl/4Ui04Z> > > *DataStax Enterprise - the database for cloud applications.* > > On Thu, Mar 10, 2016 at 11:44 AM, Dikang Gu <dikan...@gmail.com> wrote: > >> Hi Matt, >> >> Thanks for the detailed explanation! Yes, this is exactly what I'm >> looking for, "write amplification = data written to flash/data written >> by the host". >> >> We are heavily using the LCS in production, so I'd like to figure out the >> amplification caused by that and see what we can do to optimize it. I have >> the metrics of "data written to flash", and I'm wondering is there an >> easy way to get the "data written by the host" on each C* node? >> >> Thanks >> >> On Thu, Mar 10, 2016 at 8:48 AM, Matt Kennedy <mkenn...@datastax.com> >> wrote: >> >>> TL;DR - Cassandra actually causes a ton of write amplification but it >>> doesn't freaking matter any more. Read on for details... >>> >>> That slide deck does have a lot of very good information on it, but >>> unfortunately I think it has led to a fundamental misunderstanding about >>> Cassandra and write amplification. In particular, slide 51 vastly >>> oversimplifies the situation. >>> >>> The wikipedia definition of write amplification looks at this from the >>> perspective of the SSD controller: >>> https://en.wikipedia.org/wiki/Write_amplification#Calculating_the_value >>> >>> In short, write amplification = data written to flash/data written by >>> the host >>> >>> So, if I write 1MB in my application, but the SSD has to write my 1MB, >>> plus rearrange another 1MB of data in order to make room for it, then I've >>> written a total of 2MB and my write amplification is 2x. >>> >>> In other words, it is measuring how much extra the SSD controller has to >>> write in order to do its own housekeeping. >>> >>> However, the wikipedia definition is a bit more constrained than how the >>> term is used in the storage industry. The whole point of looking at write >>> amplification is to understand the impact that a particular workload is >>> going to have on the underlying NAND by virtue of the data written. So a >>> definition of write amplification that is a little more relevant to the >>> context of Cassandra is to consider this: >>> >>> write amplification = data written to flash/data written to the database >>> >>> So, while the fact that we only sequentially write large immutable >>> SSTables does in fact mean that controller-level write amplification is >>> near zero, Compaction comes along and completely destroys that tidy little >>> story. Think about it, every time a compaction re-writes data that has >>> already been written, we are creating a lot of application-level write >>> amplification. Different compaction strategies and the workload itself >>> impact what the real application-level write amp is, but generally >>> speaking, LCS is the worst, followed by STCS and DTCS will cause the least >>> write-amp. To measure this, you can usually use smartctl (may be another >>> mechanism depending on SSD manufacturer) to get the physical bytes written >>> to your SSDs and divide that by the data that you've actually logically >>> written to Cassandra. I've measured (more than two years ago) LCS write amp >>> as high as 50x on some workloads, which is significantly higher than the >>> typical controller level write amp on a b-tree style update-in-place data >>> store. Also note that the new storage engine in general reduces a lot of >>> inefficiency in the Cassandra storage engine therefore reducing the impact >>> of write amp due to compactions. >>> >>> However, if you're a person that understands SSDs, at this point you're >>> wondering why we aren't burning out SSDs right and left. The reality is >>> that general SSD endurance has gotten so good, that all this write amp >>> isn't really a problem any more. If you're curious to read more about that, >>> I recommend you start here: >>> >>> >>> http://hothardware.com/news/google-data-center-ssd-research-report-offers-surprising-results-slc-not-more-reliable-than-mlc-flash >>> >>> and the paper that article mentions: >>> >>> http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/23105-fast16-papers-schroeder.pdf >>> >>> >>> Hope this helps. >>> >>> >>> Matt Kennedy >>> >>> >>> >>> On Thu, Mar 10, 2016 at 7:05 AM, Paulo Motta <pauloricard...@gmail.com> >>> wrote: >>> >>>> This is a good source on Cassandra + write amplification: >>>> http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives >>>> >>>> 2016-03-10 9:57 GMT-03:00 Benjamin Lerer <benjamin.le...@datastax.com>: >>>> >>>>> Cassandra should not cause any write amplification. Write amplification >>>>> appends only when you updates data on SSDs. Cassandra does not update >>>>> any >>>>> data in place. Data can be rewritten during compaction but it is never >>>>> updated. >>>>> >>>>> Benjamin >>>>> >>>>> On Thu, Mar 10, 2016 at 12:42 PM, Alain RODRIGUEZ <arodr...@gmail.com> >>>>> wrote: >>>>> >>>>> > Hi Dikang, >>>>> > >>>>> > I am not sure about what you call "amplification", but as sizes >>>>> highly >>>>> > depends on the structure I think I would probably give it a try >>>>> using CCM ( >>>>> > https://github.com/pcmanus/ccm) or some test cluster with >>>>> 'production >>>>> > like' >>>>> > setting and schema. You can write a row, flush it and see how big is >>>>> the >>>>> > data cluster-wide / per node. >>>>> > >>>>> > Hope this will be of some help. >>>>> > >>>>> > C*heers, >>>>> > ----------------------- >>>>> > Alain Rodriguez - al...@thelastpickle.com >>>>> > France >>>>> > >>>>> > The Last Pickle - Apache Cassandra Consulting >>>>> > http://www.thelastpickle.com >>>>> > >>>>> > 2016-03-10 7:18 GMT+01:00 Dikang Gu <dikan...@gmail.com>: >>>>> > >>>>> > > Hello there, >>>>> > > >>>>> > > I'm wondering is there a good way to measure the write >>>>> amplification of >>>>> > > Cassandra? >>>>> > > >>>>> > > I'm thinking it could be calculated by (size of mutations written >>>>> to the >>>>> > > node)/(number of bytes written to the disk). >>>>> > > >>>>> > > Do we already have the metrics of "size of mutations written to the >>>>> > node"? >>>>> > > I did not find it in jmx metrics. >>>>> > > >>>>> > > Thanks >>>>> > > >>>>> > > -- >>>>> > > Dikang >>>>> > > >>>>> > > >>>>> > >>>>> >>>> >>>> >>> >> >> >> -- >> Dikang >> >> >