Thanks for your response. I’ll reach out to Jon.
Briefly it’s Cassandra 3.0.30, using Reaper to run full repair using sub-ranges. I’ve got java collectd setup with data being scraped into Prometheus for all the nodes. We want to do an upgrade to 4.0. I’m assuming we need to have a clean repair before performing an upgrade? When upgrading we go to 3.11.x first, then 4.0.x, then 4.1.x, is that correct? Jeff From: Jon Haddad <j...@rustyrazorblade.com> Reply-To: <user@cassandra.apache.org> Date: Friday, September 27, 2024 at 3:16 PM To: <user@cassandra.apache.org> Subject: Re: Recommend Cassandra consultant Thank you both for the recommendation! Jon On Fri, Sep 27, 2024 at 5:12 AM Aaron Ploetz <aaronplo...@gmail.com> wrote: Casting a second vote for Jon Haddad. You can reach out to him on LinkedIn: https://www.linkedin.com/in/rustyrazorblade/ Thanks, Aaron On Fri, Sep 27, 2024 at 3:57 AM guo Maxwell <cclive1...@gmail.com> wrote: I want to at Jon Haddad 😁 Bowen Song via user <user@cassandra.apache.org> 于2024年9月27日周五 16:39写道: Hello Jeff, I'm not a consultant, but do have some experience on troubleshooting this type of issues. The first thing in troubleshooting is gathering information. You don't want to troubleshoot issues blindly. Some (but not all) important information are CPU usage, network IO, disk IO, JVM heap usage, Cassandra query latency, queries/s, dropped messages, pending compactions, GC logs, Cassandra logs and system logs. Also, how is the repair run? Is it subrange repair? Is it incremental repair? On Cassandra 3.0.x and 3.x, it's recommended to do subrange full (non-incremental) repairs, because incremental repair before Cassandra 4.0 has known issues and can cause excessive anti-compaction. If the cluster had ever ran an incremental repair, there's some extra steps needed to switch to full repairs. Skipping these extra steps will lead to the previously repaired but now outdated data permanently remain on all nodes, which will not only waste disk space, but also slow down queries and increase GC pressure. Cheers, Bowen On 27/09/2024 01:33, Jeff Masud wrote: I'm hoping someone can recommend a good Cassandra consultant. We have 12 node cluster spanning across 2 data centers, when doing repairs a node will spike and be on responsive or completely die, I’m assuming it’s related to very high GC times. We're currently running 3.0.30 and looking to upgrade to a newer version once we can get a repair successfully. Please reach out to me directly. Thanks Jeff -- Jeff Masud Deasil Works 818-945-0821 x107 310-918-5333 Mobile jeff@deasil.works