Re: restarting node makes cpu load of the entire cluster to raise

Alain RODRIGUEZ Wed, 18 Jun 2014 05:44:04 -0700

This last command was supposed to be a best practice a few years ago, hope
it is still the case. I just added the recent "nodetool disablebinary"
part...



2014-06-18 14:36 GMT+02:00 Alain RODRIGUEZ <arodr...@gmail.com>:

> Thanks a lot for taking time to check the log.
>
> We just switched from 400M to 1600M NEW size in the cassandra-env.sh. It
> reduced our latency and the PARNEW GC time / second significantly...
> (described here
> http://tech.shift.com/post/74311817513/cassandra-tuning-the-jvm-for-read-heavy-workloads
> )
>
> Even when we had 400M the restart was behaving this way.
>
> We stop the node using : nodetool disablegossip && nodetool disablethrift
> && nodetool disablebinary && sleep 10 && nodetool drain && sleep 30 &&
> service cassandra stop
>
>
> 2014-06-18 14:23 GMT+02:00 Jonathan Lacefield <jlacefi...@datastax.com>:
>
> There are several long Parnew pauses that were recorded during startup.
>>  The young gen size looks large too, if I am reading that line correctly.
>>  Did you happen to overwrite the default settings for MAX_HEAP and/or NEW
>> size in the cassandra-env.sh?  The large you gen size, set via the env.sh
>> file, could be causing longer than typical pauses, which could make your
>> node appear to be unresponsive and have high CPU (CPU for the ParNew GC
>> event).
>>
>> Check out this one - INFO 11:42:51,939 GC for ParNew: 2148 ms for 2
>> collections, 1256307568 used; max is 8422162432
>> That is a 2 second GC pause.  That's very high for ParNew.  We typically
>> want a lot of tiny ParNew events as opposed to large, and less frequent,
>> ParNew events.
>>
>> One other thing that was noticed, was that the node had a lot of log
>> segment replay's during startup.  You could avoid these, or minimize them,
>> by preforming a flush or drain before stopping and starting Cassandra.
>>  This will flush memtables and clear your log segments.
>>
>>
>>
>> Jonathan Lacefield
>> Solutions Architect, DataStax
>> (404) 822 3487
>>  <http://www.linkedin.com/in/jlacefield>
>>
>> <http://www.datastax.com/cassandrasummit14>
>>
>>
>>
>> On Wed, Jun 18, 2014 at 8:05 AM, Alain RODRIGUEZ <arodr...@gmail.com>
>> wrote:
>>
>>> A simple restart of a node with no changes give this result.
>>>
>>> logs output : https://gist.github.com/arodrime/db9ab152071d1ad39f26
>>>
>>> Here are some screenshot:
>>>
>>> - htop from a node immediatly after restarting
>>> - opscenter ring view (show load cpu on all nodes)
>>> - opscenter dashboard shows the impact of a restart on latency (can
>>> affect writes or reads, it depends, reaction seems to be quite random)
>>>
>>>
>>> 2014-06-18 13:35 GMT+02:00 Jonathan Lacefield <jlacefi...@datastax.com>:
>>>
>>> Hello
>>>>
>>>>   Have you checked the log file to see what's happening during startup
>>>> ?   What caused the rolling restart?  Did you preform an upgrade or
>>>> change a config?
>>>>
>>>> > On Jun 18, 2014, at 5:40 AM, Alain RODRIGUEZ <arodr...@gmail.com>
>>>> wrote:
>>>> >
>>>> > Hi guys
>>>> >
>>>> > Using 1.2.11, when I try to rolling restart the cluster, any node I
>>>> restart makes the whole cluster cpu load to increase, reaching a "red"
>>>> state in opscenter (load from 3-4 to 20+). This happens once the node is
>>>> back online.
>>>> >
>>>> > The restarted node uses 100 % cpu for 5 - 10 min and sometimes drop
>>>> mutations.
>>>> >
>>>> > I have tried to throttle handoff to 256 (instead of 1024), yet it
>>>> doesn't seems to help that much.
>>>> >
>>>> > Disks are not the bottleneck. PARNEW GC increase a bit, but nothing
>>>> problematic I think.
>>>> >
>>>> > Basically, what could be happening on node restart ? What is taking
>>>> that much CPU on every machine ? There is no steal or iowait.
>>>> >
>>>> > What can I try to tune ?
>>>> >
>>>>
>>>
>>>
>>
>

Re: restarting node makes cpu load of the entire cluster to raise

Reply via email to