Re: Changing replication factor

Vegard Berget Mon, 17 Jun 2013 05:34:16 -0700

Hi,
Thank you for the information.I have increased the rf, and I think the
increase we have seen in cpu load etc is due to the counter cf's,
which is almost write-only (reads a few times a day).  The load
increase is noticeable, but no problem.Repair went fine.  But I
noticed that when I increased rf for a counter column and for (some
completely different reasons) took one node down, and after that ran
Repair I would get multiple lines in system.log:"invalid counter shard
detected; (X, Y, Z) and (X, Y, Z2) differ only in count; will pick
highest to self-heal; this indicates a bug or corruption generated a
bad counter shard"I guess this is because that while the node was
down, the counters gets out of sync and needs to just pick the
highest?  In my case this will be (more or less) correct, since the
sync-problem happened because of a downed node,which means _all_
increases happens on the other node and that node will have the
correct number?  I am just curious, as some minor errors in the
counters would be no problem for us.
.vegard,
----- Original Message -----
From: user@cassandra.apache.org
To:, "Vegard Berget" 
Cc:
Sent:Fri, 14 Jun 2013 17:20:26 -0700
Subject:Re: Changing replication factor

 On Mon, Jun 10, 2013 at 6:04 AM, Vegard Berget  wrote:
 > If one increases the replication factor of a keyspace and then do a
repair,
 > how will this affect the performance of the affected nodes? Could
we risk
 > the nodes being (more or less) unresponsive while repair is going
on?

 Repair is a relatively heavyweight activity (the heaviest a cassandra
 node can do!) which requires significant headroom in terms of CPU,
 heap memory and disk space. It is possible that nodes could become
 unavailable transiently during the repair, but unless they are
already
 very busy they should not become completely unresponsive. For one
 thing, both compaction and streaming respect throttles which are
 designed to minimize the impact of the streaming/compaction workload
 resulting from repair.

 > The nodes I am speaking of contains ~100gb of data.

 This is a relatively small amount of data per node, which makes the
 impact of Repair less severe.

 > Also, some of the keyspaces I am considering increase the
replication factor
 > for contains Counter Column Families (has rf:1). I think I have
read that
 > adding replication to counter cfs will affect performance
negatively, is
 > this correct?

 Per Sylvain (one of the primary authors of the Counters codebase) [1]
:

 "
 For counters, it's a little bit different. At RF=3, for each inserts,
 one node is doing a write *and* a read, while the two other nodes are
 only doing a
 write. So given that the read takes a time is non negligible, you
 should see simple
 improvement a RF=3 compared to RF=1 because each node gets 1/3 of the
 reads (involved in
 the counter write) it would get if it was the only replica. Now if
the
 write time
 were negligible compared to the read time, then yes you would see
roughly a 3x
 increase. But while writes are still faster than reads in Cassandra,
 reads a now fairly
 fast too (but all this depends on other factor like how much the
 caches helps, etc...), so it
 will likely be less than a 3x increase. Should be noticeable though."
 "

 I interpret the above to mean that RF=3 is actually slightly *faster*
 for Counters than RF=1.

 =Rob

 [1]
http://mail-archives.apache.org/mod_mbox/cassandra-user/201110.mbox/%3ccakkz8q0thzzsbu2370mx6jpeec3lh17pjmv1kojggauajup...@mail.gmail.com%3E

Re: Changing replication factor

Reply via email to