There are several long Parnew pauses that were recorded during startup.
 The young gen size looks large too, if I am reading that line correctly.
 Did you happen to overwrite the default settings for MAX_HEAP and/or NEW
size in the cassandra-env.sh?  The large you gen size, set via the env.sh
file, could be causing longer than typical pauses, which could make your
node appear to be unresponsive and have high CPU (CPU for the ParNew GC
event).

Check out this one - INFO 11:42:51,939 GC for ParNew: 2148 ms for 2
collections, 1256307568 used; max is 8422162432
That is a 2 second GC pause.  That's very high for ParNew.  We typically
want a lot of tiny ParNew events as opposed to large, and less frequent,
ParNew events.

One other thing that was noticed, was that the node had a lot of log
segment replay's during startup.  You could avoid these, or minimize them,
by preforming a flush or drain before stopping and starting Cassandra.
 This will flush memtables and clear your log segments.



Jonathan Lacefield
Solutions Architect, DataStax
(404) 822 3487
<http://www.linkedin.com/in/jlacefield>

<http://www.datastax.com/cassandrasummit14>



On Wed, Jun 18, 2014 at 8:05 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

> A simple restart of a node with no changes give this result.
>
> logs output : https://gist.github.com/arodrime/db9ab152071d1ad39f26
>
> Here are some screenshot:
>
> - htop from a node immediatly after restarting
> - opscenter ring view (show load cpu on all nodes)
> - opscenter dashboard shows the impact of a restart on latency (can affect
> writes or reads, it depends, reaction seems to be quite random)
>
>
> 2014-06-18 13:35 GMT+02:00 Jonathan Lacefield <jlacefi...@datastax.com>:
>
> Hello
>>
>>   Have you checked the log file to see what's happening during startup
>> ?   What caused the rolling restart?  Did you preform an upgrade or
>> change a config?
>>
>> > On Jun 18, 2014, at 5:40 AM, Alain RODRIGUEZ <arodr...@gmail.com>
>> wrote:
>> >
>> > Hi guys
>> >
>> > Using 1.2.11, when I try to rolling restart the cluster, any node I
>> restart makes the whole cluster cpu load to increase, reaching a "red"
>> state in opscenter (load from 3-4 to 20+). This happens once the node is
>> back online.
>> >
>> > The restarted node uses 100 % cpu for 5 - 10 min and sometimes drop
>> mutations.
>> >
>> > I have tried to throttle handoff to 256 (instead of 1024), yet it
>> doesn't seems to help that much.
>> >
>> > Disks are not the bottleneck. PARNEW GC increase a bit, but nothing
>> problematic I think.
>> >
>> > Basically, what could be happening on node restart ? What is taking
>> that much CPU on every machine ? There is no steal or iowait.
>> >
>> > What can I try to tune ?
>> >
>>
>
>

Reply via email to