RF 6???

Well, the traffic is routing to the other 6 nodes, which likeky can serve
the traffic with the super high RF, and the newly restarted node not seeing
the traffic until gossip settles?

On Fri, Sep 6, 2024, 2:54 PM Jeff Jirsa <jji...@gmail.com> wrote:

> The unfortunate reality here is I don’t think anyone is going to be able
> to answer with the data provided.
>
> Are the disk IOPS from cassandra reads? Or compaction? Or repair? Do they
> ramp with client reads (is that curve matching your customer traffic?)?
> Are they from client data reads or from internal reads (e.g. schema and
> auth from client reconnects)?
> Are they the first reads, or read repair?
>
> If this were my cluster, I’d be looking at the rest of the graphs to try
> to tell what “else” was happening beyond high read IOPS. If nothing stood
> out, I would have taken a stack trace to try to see what those nodes were
> doing at the time, vs what they’re doing “normally”.
>
>
> On Sep 6, 2024, at 12:29 PM, Pradeep Badiger <pradeepbadi...@fico.com>
> wrote:
>
> Thanks, Jeff. We use QUORUM consistency for reads and writes. Even we are
> clueless as to why such an issue could occur. Do you think restarting again
> and running the full repair on the node would help?
>
> *From:* Jeff Jirsa <jji...@gmail.com>
> *Sent:* Friday, September 6, 2024 2:03 PM
> *To:* cassandra <user@cassandra.apache.org>
> *Cc:* Pradeep Badiger <pradeepbadi...@fico.com>
> *Subject:* [EXTERNAL] Re: Cassandra 3.11 - below normal disk read after
> restart
>
>
> *CAUTION:* This email originated from outside the organization. Do not
> click links or open attachments unless you recognize the sender and know
> the content is safe.
> If they went up by 1/7th, could potentially assume it was something
> related to the snitch not choosing the restarted host. They went up by a
> lot (2-3x?). What consistency level do you use for reads and writes, and do
> you have graphs for local reads / hint delivery? (I’m GUESSING that you’re
> seeing extra read repair or some other multiplier kick in, but it doesn’t
> make a lot of sense to be honest).
>
>
>
>
> On Sep 6, 2024, at 9:47 AM, Pradeep Badiger via user <
> user@cassandra.apache.org> wrote:
>
> Hi,
>
> We are using Cassandra 3.11 with a cluster of 7 nodes and replication of 6
> with most of the default configurations. During a recent maintenance
> window, one of the nodes was restarted. The node came back up normal, with
> no errors of any sort. But when the application started using the cluster,
> we found *below-normal* disk io read rates on the node that was
> restarted, and other nodes in the cluster reported *above-normal* disk io
> read rates. This difference became significant causing alerts to get
> reported by the monitoring system. As a measure to resolve the issue the
> application was stopped and the entire cluster was restarted after which
> all 7 nodes reported almost the same read rates.
>
> <image001.png>
>
> *Figure 1 - After the node 53 was restarted.*
>
> <image002.png>
>
> *Figure 2 - After the entire cluster restart.*
> The node in question was not down for a very long time. Is there any
> specific reason the read rates would differ like this? Is there a way to
> resolve this without restarting the entire cluster?
>
> Thanks,
> Pradeep V.B.
> This email and any files transmitted with it are confidential, proprietary
> and intended solely for the individual or entity to whom they are
> addressed. If you have received this email in error please delete it
> immediately.
>
>
> This email and any files transmitted with it are confidential, proprietary
> and intended solely for the individual or entity to whom they are
> addressed. If you have received this email in error please delete it
> immediately.
>
>
>

Reply via email to