Yes it's read repair you can lower the read repair chance to tune this.
On Jul 29, 2011, at 6:31 PM, Aaron Griffith <aaron.c.griff...@gmail.com> wrote: > I currently have a 9 node cassandra cluster setup as follows: > > DC1: Six nodes > DC2: Three nodes > > The tokens alternate between the two datacenters. > > I have hadoop installed as tasktracker/datanodes on the > three cassandra nodes in DC2. > > There is another non cassandra node that is used as the hadoop namenode / job > tracker. > > When running pig scripts pointed to a node in DC2 using LOCAL_QUORUM as read > consistency I am seeing network and cpu spikes on the nodes in DC1. I was > not expecting any impact on those nodes when local quorum is used. > > Can read repair be causing the traffic/cpu spikes? > > The replication settings for DC1 is 5, and for DC2 is 1. > > When looking at the map tasks I am seeing input splits for computers in > both data centers. I am not sure what this means. My thought is > that is should only be getting data from the nodes in DC2. > > Thanks > > Aaron >