I think the "Owns" calculation isn't taking racks into consideration.  The
fact that you aren't alternating racks (availability zones, with EC2Snitch)
is what is causing the imbalance.  I suggest either using the same rack for
all nodes (preferred) or alternate your racks/AZs: 1b, 1c, 1d, 1b, 1c, 1d,
etc.


On Thu, Oct 31, 2013 at 1:12 PM, Ashish Tyagi <tyagi.i...@gmail.com> wrote:

> We have a 9 node cluster. 6 nodes are in one data-center and 3 nodes in
> the other. All machines are Amazon M1.XLarge configuration.
>
> Datacenter: DC1
> ==========
> Address         Rack        Status State   Load
> Owns                Token
>
>
> ip11  1b          Up     Normal  76.46 GB        16.67%
> 0
> ip12  1b          Up     Normal  44.66 GB        16.67%
> 28356863910078205288614550619314017621
> ip13  1c          Up     Normal  85.94 GB        16.67%
> 56713727820156410577229101238628035241
> ip14  1c          Up     Normal  17.55 GB        16.67%
> 85070591730234615865843651857942052863
> ip15  1d          Up     Normal  80.74 GB        16.67%
> 113427455640312821154458202477256070484
> ip16  1d          Up     Normal  20.88 GB        16.67%
> 141784319550391026443072753096570088105
>
> Datacenter: DC2
> ==========
> Address         Rack        Status State   Load
> Owns                Token
>
> ip21  1a          Up     Normal  78.32 GB        0.00%
> 1001
> ip22  1b          Up     Normal  71.23 GB        0.00%
> 56713727820156410577229101238628036241
> ip23  1b          Up     Normal  53.49 GB        0.00%
> 113427455640312821154458202477256071484
>
> Problem is that node with ip address: ip11 often has 5-10 times more load
> than any other node. Most of the operations are on counters. The primary
> column family (which receives most writes) has a replication factor of 2 in
> DataCenter DC1 and also in DataCenter DC2. The traffic is write heavy
> (reads are less than 10% of total requests). We are using size-tiered
> compaction. Both writes and reads happen with a consistency factor of
> LOCAL_QUORUM.
>
> More information:
>
> 1. cassandra.yaml - http://pastebin.com/u344fA6z
> 2. Jmap heap when node under high loads - http://pastebin.com/ib3D0Pa
> 3. Nodetool tpstats - http://pastebin.com/s0AS7bGd
> 4. Cassandra-env.sh - http://pastebin.com/ubp4cGUx
> 5. GC log lines -  http://pastebin.com/Y0TKphsm
>
> Am I doing anything wrong. Any pointers will be appreciated.
>
> Thanks in advance,
> Ashish
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Reply via email to