Re: [EXTERNAL] Cassandra 3.11 - below normal disk read after restart

Jeff Jirsa Fri, 06 Sep 2024 12:54:22 -0700

The unfortunate reality here is I don’t think anyone is going to be able to 
answer with the data provided.


Are the disk IOPS from cassandra reads? Or compaction? Or repair? Do they ramp 
with client reads (is that curve matching your customer traffic?)? 
Are they from client data reads or from internal reads (e.g. schema and auth 
from client reconnects)? 
Are they the first reads, or read repair? 

If this were my cluster, I’d be looking at the rest of the graphs to try to 
tell what “else” was happening beyond high read IOPS. If nothing stood out, I 
would have taken a stack trace to try to see what those nodes were doing at the 
time, vs what they’re doing “normally”. 


> On Sep 6, 2024, at 12:29 PM, Pradeep Badiger <pradeepbadi...@fico.com> wrote:
> 
> Thanks, Jeff. We use QUORUM consistency for reads and writes. Even we are 
> clueless as to why such an issue could occur. Do you think restarting again 
> and running the full repair on the node would help?
>  
> From: Jeff Jirsa <jji...@gmail.com <mailto:jji...@gmail.com>>
> Sent: Friday, September 6, 2024 2:03 PM
> To: cassandra <user@cassandra.apache.org <mailto:user@cassandra.apache.org>>
> Cc: Pradeep Badiger <pradeepbadi...@fico.com <mailto:pradeepbadi...@fico.com>>
> Subject: [EXTERNAL] Re: Cassandra 3.11 - below normal disk read after restart
>  
> CAUTION: This email originated from outside the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
> 
> If they went up by 1/7th, could potentially assume it was something related 
> to the snitch not choosing the restarted host. They went up by a lot (2-3x?). 
> What consistency level do you use for reads and writes, and do you have 
> graphs for local reads / hint delivery? (I’m GUESSING that you’re seeing 
> extra read repair or some other multiplier kick in, but it doesn’t make a lot 
> of sense to be honest). 
>  
>  
> 
> 
> On Sep 6, 2024, at 9:47 AM, Pradeep Badiger via user 
> <user@cassandra.apache.org <mailto:user@cassandra.apache.org>> wrote:
>  
> Hi,
>  
> We are using Cassandra 3.11 with a cluster of 7 nodes and replication of 6 
> with most of the default configurations. During a recent maintenance window, 
> one of the nodes was restarted. The node came back up normal, with no errors 
> of any sort. But when the application started using the cluster, we found 
> below-normal disk io read rates on the node that was restarted, and other 
> nodes in the cluster reported above-normal disk io read rates. This 
> difference became significant causing alerts to get reported by the 
> monitoring system. As a measure to resolve the issue the application was 
> stopped and the entire cluster was restarted after which all 7 nodes reported 
> almost the same read rates.
>  
> <image001.png>
> Figure 1 - After the node 53 was restarted.
> 
>  
> <image002.png>
> Figure 2 - After the entire cluster restart.
> 
> The node in question was not down for a very long time. Is there any specific 
> reason the read rates would differ like this? Is there a way to resolve this 
> without restarting the entire cluster?
>  
> Thanks,
> Pradeep V.B.
> This email and any files transmitted with it are confidential, proprietary 
> and intended solely for the individual or entity to whom they are addressed. 
> If you have received this email in error please delete it immediately.
>  
> This email and any files transmitted with it are confidential, proprietary 
> and intended solely for the individual or entity to whom they are addressed. 
> If you have received this email in error please delete it immediately.

Re: [EXTERNAL] Cassandra 3.11 - below normal disk read after restart

Reply via email to