Thanks for your response.  I’ll reach out to Jon. 

 

Briefly it’s Cassandra 3.0.30, using Reaper to run full repair using 
sub-ranges. I’ve got java collectd setup with data being scraped into 
Prometheus for all the nodes.  

 

We want to do an upgrade to 4.0.  I’m assuming we need to have a clean repair 
before performing an upgrade?  When upgrading we go to 3.11.x first, then 
4.0.x, then 4.1.x, is that correct?  

 

Jeff 

 

 

From: Jon Haddad <j...@rustyrazorblade.com>
Reply-To: <user@cassandra.apache.org>
Date: Friday, September 27, 2024 at 3:16 PM
To: <user@cassandra.apache.org>
Subject: Re: Recommend Cassandra consultant

 

Thank you both for the recommendation!

 

Jon

 

 

On Fri, Sep 27, 2024 at 5:12 AM Aaron Ploetz <aaronplo...@gmail.com> wrote:

Casting a second vote for Jon Haddad. You can reach out to him on LinkedIn: 
https://www.linkedin.com/in/rustyrazorblade/

 

Thanks,

 

Aaron

 

 

On Fri, Sep 27, 2024 at 3:57 AM guo Maxwell <cclive1...@gmail.com> wrote:

I want to at Jon Haddad 😁

 

Bowen Song via user <user@cassandra.apache.org> 于2024年9月27日周五 16:39写道:

Hello Jeff,

I'm not a consultant, but do have some experience on troubleshooting this type 
of issues.

The first thing in troubleshooting is gathering information. You don't want to 
troubleshoot issues blindly.

Some (but not all) important information are CPU usage, network IO, disk IO, 
JVM heap usage, Cassandra query latency, queries/s, dropped messages, pending 
compactions, GC logs, Cassandra logs and system logs.

Also, how is the repair run? Is it subrange repair? Is it incremental repair? 
On Cassandra 3.0.x and 3.x, it's recommended to do subrange full 
(non-incremental) repairs, because incremental repair before Cassandra 4.0 has 
known issues and can cause excessive anti-compaction. If the cluster had ever 
ran an incremental repair, there's some extra steps needed to switch to full 
repairs. Skipping these extra steps will lead to the previously repaired but 
now outdated data permanently remain on all nodes, which will not only waste 
disk space, but also slow down queries and increase GC pressure.

Cheers,
Bowen

On 27/09/2024 01:33, Jeff Masud wrote:

I'm hoping someone can recommend a good Cassandra consultant. 

 

We have 12 node cluster spanning across 2 data centers, when doing repairs a 
node will spike and be on responsive or completely die, I’m assuming it’s 
related to very high GC times.

 

We're currently running 3.0.30 and looking to upgrade to a newer version once 
we can get a repair successfully. 

 

Please reach out to me directly.

 

Thanks 

Jeff

 

-- 

Jeff Masud

Deasil Works

818-945-0821 x107

310-918-5333 Mobile

jeff@deasil.works

 

Reply via email to