Re: Cassandra Pig with network topology and data centers.

Jake Luciani Fri, 29 Jul 2011 17:07:48 -0700

Yes it's read repair you can lower the read repair chance to tune this.


On Jul 29, 2011, at 6:31 PM, Aaron Griffith <aaron.c.griff...@gmail.com> wrote:

> I currently have a 9 node cassandra cluster setup as follows:
> 
> DC1: Six nodes
> DC2: Three nodes
> 
> The tokens alternate between the two datacenters.
> 
> I have hadoop installed as tasktracker/datanodes on the 
> three cassandra nodes in DC2.
> 
> There is another non cassandra node that is used as the hadoop namenode / job 
> tracker.
> 
> When running pig scripts pointed to a node in DC2 using LOCAL_QUORUM as read
> consistency I am seeing network and cpu spikes on the nodes in DC1.  I was 
> not expecting any impact on those nodes when local quorum is used.
> 
> Can read repair be causing the traffic/cpu spikes?  
> 
> The replication settings for DC1 is 5, and for DC2 is 1.
> 
> When looking at the map tasks I am seeing input splits for computers in 
> both data centers.  I am not sure what this means.  My thought is 
> that is should only be getting data from the nodes in DC2.
> 
> Thanks
> 
> Aaron
>

Re: Cassandra Pig with network topology and data centers.

Reply via email to