Hi Evan,

The clients connect to all nodes. We tried shutting the thrift server on
the affected node. Loads did not come down.



On Fri, Nov 1, 2013 at 12:59 AM, Evan Weaver <e...@fauna.org> wrote:

> Are all your clients only connecting to your first node? I would
> probably strace it and compare the trace to one from a lightly loaded
> node.
>
> On Thu, Oct 31, 2013 at 7:12 PM, Ashish Tyagi <tyagi.i...@gmail.com>
> wrote:
> > We have a 9 node cluster. 6 nodes are in one data-center and 3 nodes in
> the
> > other. All machines are Amazon M1.XLarge configuration.
> >
> > Datacenter: DC1
> > ==========
> > Address         Rack        Status State   Load            Owns
> > Token
> >
> > ip11  1b          Up     Normal  76.46 GB        16.67%              0
> > ip12  1b          Up     Normal  44.66 GB        16.67%
> > 28356863910078205288614550619314017621
> > ip13  1c          Up     Normal  85.94 GB        16.67%
> > 56713727820156410577229101238628035241
> > ip14  1c          Up     Normal  17.55 GB        16.67%
> > 85070591730234615865843651857942052863
> > ip15  1d          Up     Normal  80.74 GB        16.67%
> > 113427455640312821154458202477256070484
> > ip16  1d          Up     Normal  20.88 GB        16.67%
> > 141784319550391026443072753096570088105
> >
> > Datacenter: DC2
> > ==========
> > Address         Rack        Status State   Load            Owns
> > Token
> >
> > ip21  1a          Up     Normal  78.32 GB        0.00%               1001
> > ip22  1b          Up     Normal  71.23 GB        0.00%
> > 56713727820156410577229101238628036241
> > ip23  1b          Up     Normal  53.49 GB        0.00%
> > 113427455640312821154458202477256071484
> >
> > Problem is that node with ip address: ip11 often has 5-10 times more load
> > than any other node. Most of the operations are on counters. The primary
> > column family (which receives most writes) has a replication factor of 2
> in
> > DataCenter DC1 and also in DataCenter DC2. The traffic is write heavy
> (reads
> > are less than 10% of total requests). We are using size-tiered
> compaction.
> > Both writes and reads happen with a consistency factor of LOCAL_QUORUM.
> >
> > More information:
> >
> > 1. cassandra.yaml - http://pastebin.com/u344fA6z
> > 2. Jmap heap when node under high loads - http://pastebin.com/ib3D0Pa
> > 3. Nodetool tpstats - http://pastebin.com/s0AS7bGd
> > 4. Cassandra-env.sh - http://pastebin.com/ubp4cGUx
> > 5. GC log lines -  http://pastebin.com/Y0TKphsm
> >
> > Am I doing anything wrong. Any pointers will be appreciated.
> >
> > Thanks in advance,
> > Ashish
>

Reply via email to