On 10/19/2015 10:55 AM, Graham Sanderson wrote:
If you had Cassandra 2.0.x (possibly before) and upgraded to Cassandra
2.1, you may have had
commitlog_sync: batch
commitlog_sync_batch_window_in_ms: 25
in you cassiandra.yaml
It turned out that this was pretty much broken in 2.0 (i.e. fsyncs just
happened immediately), but fixed in 2.1, *which meant that every
mutation blocked its writer thread for 25ms meaning at 80
mutations/sec/writer thread you’d start DROPPING mutations if your write
timeout is 2000ms.*
This turns out to be a massive problem if you write fast, and the
default commitlog_sync_batch_window_in_ms was changed to 2 ms in 2.1.6
as a way of addressing this (with some suggesting 1ms)
Neither of these changes got much fanfare except an eventual reference
in CHANGES.TXT
With 2.1.9 if you aren’t doing periodic sync, then I think the new
behavior is just to sync whenever the commit logs have a
consistent/complete set of mutations ready.
Note this is hard to diagnose because CPU is idle and pretty much all
latency metrics (except the overall coordinator write) do not count this
time (and you probably weren’t noticing the 25ms write ACK time). It
turned out for us that one of our nodes was getting more writes (> 20k
mutations per second) which was about the magic number… anything shy of
that and everything looked fine, but just by going slightly over, this
node was dropping lots of mutations.
If you would be kind enough to submit a patch to JIRA for NEWS.txt
(aligned with the right versions you're warning about) that includes the
info upgrading users might need, that would be great!
--
Kind regards,
Michael