[ 
https://issues.apache.org/jira/browse/CASSANDRA-20571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17947964#comment-17947964
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-20571:
-------------------------------------------------------

I was able to narrow down the CPU is caused by 
[https://github.com/apache/cassandra/commit/5be1038c5d38af32d3cbb0545d867f21304f3a46]

Here is what I have tried to narrow down in my cluster (used the same node to 
verify).
 * 4.0.12 : Bootstrap completes in 3 minutes and no CPU issue.
 * 4.1.0: Bootstrap completes in 26 minutes and no CPU issue.
 * 4.1.6: Bootstrap completes in 3 minutes (same as 4.0.x) but huge CPU spike.
 * 4.1.1: Bootstrap completes in 3 minutes (same as 4.0.x) but huge CPU spike.]
 * 4.1.1 + revert of the 
[commit|https://github.com/apache/cassandra/commit/5be1038c5d38af32d3cbb0545d867f21304f3a46]:
  Bootstrap completes in 26 minutes and no CPU issue.

> CPU Spikes during the Streaming of data
> ---------------------------------------
>
>                 Key: CASSANDRA-20571
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20571
>             Project: Apache Cassandra
>          Issue Type: Bug
>            Reporter: Jai Bheemsen Rao Dhanwada
>            Priority: Normal
>         Attachments: async_profiler_cpu.html
>
>
> Hello Team,
> We are seeing an issue where there is a huge spike in CPU on the node which 
> is streaming data (adding a new node or replacing a node or running a 
> nodetool rebuild). Essentially anytime when there is a Streaming is involved 
> the CPU spike is very huge. This does not happen in all the clusters but we 
> occasionally see this issue on specific cluster.
>  
> C* version: 4.1.6 (> 4.1.0)
> Schema: All the tables use counter data types.
> CPU Cores: 16
>  
> The same worksloads + clusters types do not show this behavior with the 4.0.x 
> version of cassandra, hence we suspect something changed in 4.1.6. Looking at 
> the top threads it's mostly the StreamDeserialize + compaction.
> {code:java}
> top - 17:01:29 up 18:42,  2 users,  load average: 51.75, 13.61, 4.79
> Threads: 741 total,  54 running, 687 sleeping,   0 stopped,   0 zombie
> %Cpu(s): 91.5 us,  4.9 sy,  0.0 ni,  1.4 id,  0.7 wa,  1.1 hi,  0.4 si,  0.0 
> st
> MiB Mem :  31176.5 total,   8762.5 free,  11028.0 used,  11386.0 buff/cache
> MiB Swap:      0.0 total,      0.0 free,      0.0 used.  19334.3 avail Mem
>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
>  305763 xxxxxx    20   0   18.6g   9.8g 446524 R  30.8  32.2   0:04.69 
> Stream-Deserial
>  305815 xxxxxx    20   0   18.6g   9.8g 446524 R  28.6  32.2   0:04.81 
> Stream-Deserial
>  300600 xxxxxx    20   0   18.6g   9.8g 446524 R  27.9  32.2   0:04.73 
> CompactionExecu
>  305678 xxxxxx    20   0   18.6g   9.8g 446524 R  27.9  32.2   0:03.98 
> Stream-Deserial
>  305602 xxxxxx    20   0   18.6g   9.8g 446524 R  27.6  32.2   0:04.65 
> Stream-Deserial
>  305563 xxxxxx    20   0   18.6g   9.8g 446524 R  27.3  32.2   0:04.02 
> CompactionExecu
>  305687 xxxxxx    20   0   18.6g   9.8g 446524 R  26.9  32.2   0:04.28 
> Stream-Deserial
>  305707 xxxxxx    20   0   18.6g   9.8g 446524 S  26.9  32.2   0:04.29 
> Stream-Deserial
>  305714 xxxxxx    20   0   18.6g   9.8g 446524 R  26.9  32.2   0:04.91 
> Stream-Deserial
>  305569 xxxxxx    20   0   18.6g   9.8g 446524 R  26.6  32.2   0:05.69 
> Stream-Deserial
>  305771 xxxxxx    20   0   18.6g   9.8g 446524 R  26.6  32.2   0:03.99 
> Stream-Deserial
>  305817 xxxxxx    20   0   18.6g   9.8g 446524 R  26.3  32.2   0:03.79 
> Stream-Deserial
>  305566 xxxxxx    20   0   18.6g   9.8g 446524 R  26.0  32.2   0:04.64 
> CompactionExecu {code}
> Initial Hypothesis was if streaming_stats are playing a role here based on: 
> https://issues.apache.org/jira/browse/CASSANDRA-18110. However we turned the 
> streaming_stats: false and still see a spike in CPU. Post the streaming is 
> complete the cluster is back to normal state where we don't see a spike in 
> CPU but we would like to understand what's causing the huge CPU spikes. I 
> have profiler attached during the time of CPU.
> Please let me know if you need any other details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to