CASSANDRA-9504

Graham Sanderson Mon, 19 Oct 2015 12:28:10 -0700

But basically if you were on 2.1.0 thru 2.1.5 you probably couldn’t know to 
change your config
If you were on 2.1.6 thru 2.1.8 you may not have noticed the NEWS.TXT change 
and changed your config
If you are on 2.1.9+ you are probably OK


if you are using periodic fsync then you don’t have an issue

> On Oct 19, 2015, at 11:37 AM, Graham Sanderson <gra...@vast.com> wrote:
> 
> - commitlog_sync_batch_window_in_ms behavior has changed from the
>   maximum time to wait between fsync to the minimum time.  We are 
>   working on making this more user-friendly (see CASSANDRA-9533) but in the
>   meantime, this means 2.1 needs a much smaller batch window to keep
>   writer threads from starving.  The suggested default is now 2ms.
> was added retroactively to NEWS.txt in 2.1.6 which is why it is not obvious
> 
>> On Oct 19, 2015, at 11:03 AM, Michael Shuler <mich...@pbandjelly.org 
>> <mailto:mich...@pbandjelly.org>> wrote:
>> 
>> On 10/19/2015 10:55 AM, Graham Sanderson wrote:
>>> If you had Cassandra 2.0.x (possibly before) and upgraded to Cassandra
>>> 2.1, you may have had
>>> 
>>> commitlog_sync: batch
>>> 
>>> commitlog_sync_batch_window_in_ms: 25
>>> 
>>> 
>>> in you cassiandra.yaml
>>> 
>>> It turned out that this was pretty much broken in 2.0 (i.e. fsyncs just
>>> happened immediately), but fixed in 2.1, *which meant that every
>>> mutation blocked its writer thread for 25ms meaning at 80
>>> mutations/sec/writer thread you’d start DROPPING mutations if your write
>>> timeout is 2000ms.*
>>> 
>>> This turns out to be a massive problem if you write fast, and the
>>> default commitlog_sync_batch_window_in_ms was changed to 2 ms in 2.1.6
>>> as a way of addressing this (with some suggesting 1ms)
>>> 
>>> Neither of these changes got much fanfare except an eventual reference
>>> in CHANGES.TXT
>>> 
>>> With 2.1.9 if you aren’t doing periodic sync, then I think the new
>>> behavior is just to sync whenever the commit logs have a
>>> consistent/complete set of mutations ready.
>>> 
>>> Note this is hard to diagnose because CPU is idle and pretty much all
>>> latency metrics (except the overall coordinator write) do not count this
>>> time (and you probably weren’t noticing the 25ms write ACK time). It
>>> turned out for us that one of our nodes was getting more writes (> 20k
>>> mutations per second) which was about the magic number… anything shy of
>>> that and everything looked fine, but just by going slightly over, this
>>> node was dropping lots of mutations.
>> 
>> If you would be kind enough to submit a patch to JIRA for NEWS.txt (aligned 
>> with the right versions you're warning about) that includes the info 
>> upgrading users might need, that would be great!
>> 
>> -- 
>> Kind regards,
>> Michael
>

smime.p7s
Description: S/MIME cryptographic signature

Re: BEWARE https://issues.apache.org/jira/browse/CASSANDRA-9504

Reply via email to