Thanks for the responses guys.

I also suspected GC and I guess it could be it, since during the spikes
logs are filled with messages like "GC for ConcurrentMarkSweep: 5908 ms for
1 collections, 1986282520 used; max is 8375238656", often right before
messages about dropped queries, unlike other, unaffected, nodes that only
have "GC for ParNew: 230 ms for 1 collections, 4418571760 used; max is
8375238656" type of messages.

Is my best shot to play with JVM settings trying to tune garbage collection
then?


On Thu, Sep 10, 2015 at 6:52 AM, Samuel CARRIERE <samuel.carri...@urssaf.fr>
wrote:

> Hi Roman,
> If it affects only a subset of nodes and it's always the same ones, it
> could be a "problem" with your data model : maybe some (too) wide rows on
> theses nodes.
> If one of your row is too wide, the deserialisation of the columns index
> of this row can take a lot of resources (disk, RAM, and CPU).
> If you are using leveled compaction strategy and you see anormaly big
> sstables on thoses nodes, it could be a clue.
> Regards,
> Samuel
>
> Robert Wille <rwi...@fold3.com> a écrit sur 10/09/2015 15:27:41 :
>
> > De : Robert Wille <rwi...@fold3.com>
> > A : "user@cassandra.apache.org" <user@cassandra.apache.org>,
> > Date : 10/09/2015 15:30
> > Objet : Re: High CPU usage on some of nodes
> >
> > It sounds like its probably GC. Grep for GC in system.log to verify.
> > If it is GC, there are a myriad of issues that could cause it, but
> > at least you’ve narrowed it down.
> >
> > On Sep 9, 2015, at 11:05 PM, Roman Tkachenko <ro...@mailgunhq.com>
> wrote:
> >
> > > Hey guys,
> > >
> > > We've been having issues in the past couple of days with CPU usage
> > / load average suddenly skyrocketing on some nodes of the cluster,
> > affecting performance significantly so majority of requests start
> > timing out. It can go on for several hours, with CPU spiking through
> > the roof then coming back down to norm and so on. Weirdly, it
> > affects only a subset of nodes and it's always the same ones. The
> > boxes Cassandra is running on are pretty beefy, 24 cores, and these
> > CPU spikes go up to >1000%.
> > >
> > > What is the best way to debug such kind of issues and find out
> > what Cassandra is doing during spikes like this? Doesn't seem to be
> > compaction related as sometimes during these spikes "nodetool
> > compactionstats" says no compactions are running.
> > >
> > > Thanks!
> > >
> >
>

Reply via email to