I want to at Jon Haddad 😁 Bowen Song via user <user@cassandra.apache.org> 于2024年9月27日周五 16:39写道:
> Hello Jeff, > > I'm not a consultant, but do have some experience on troubleshooting this > type of issues. > > The first thing in troubleshooting is gathering information. You don't > want to troubleshoot issues blindly. > > Some (but not all) important information are CPU usage, network IO, disk > IO, JVM heap usage, Cassandra query latency, queries/s, dropped messages, > pending compactions, GC logs, Cassandra logs and system logs. > > Also, how is the repair run? Is it subrange repair? Is it incremental > repair? On Cassandra 3.0.x and 3.x, it's recommended to do subrange full > (non-incremental) repairs, because incremental repair before Cassandra 4.0 > has known issues and can cause excessive anti-compaction. If the cluster > had ever ran an incremental repair, there's some extra steps needed to > switch to full repairs. Skipping these extra steps will lead to the > previously repaired but now outdated data permanently remain on all nodes, > which will not only waste disk space, but also slow down queries and > increase GC pressure. > > Cheers, > Bowen > On 27/09/2024 01:33, Jeff Masud wrote: > > I'm hoping someone can recommend a good Cassandra consultant. > > > > We have 12 node cluster spanning across 2 data centers, when doing repairs > a node will spike and be on responsive or completely die, I’m assuming it’s > related to very high GC times. > > > > We're currently running 3.0.30 and looking to upgrade to a newer version > once we can get a repair successfully. > > > > Please reach out to me directly. > > > > Thanks > > Jeff > > > > -- > > Jeff Masud > > Deasil Works > > 818-945-0821 x107 > > 310-918-5333 Mobile > > jeff@deasil.works > > > >