I strongly suggest moving to 4.0 and to set up Reaper. Managing repairs yourself is a waste of time, and you're almost certainly not doing it optimally.
Jon On Tue, Dec 17, 2024 at 12:40 PM Miguel Santos-Lopez <mlo...@ims.tech> wrote: > We haven’t had the chance to upgrade to 4, let alone 5. Has there been a > big chance wrt to repairs since the old days of 3.11? :-) > > In my experience the problems have been on one hand a performance & > latency hit, but also a lack of flexibility in the tooling: often I had > repairs failing and the only option I know of using plain nodetool is to > restart again the repair. I ended up wrapping the call to nodetool in a > bash script allowing only selected keyspaces and tables to be repaired. > In this way I get a clear picture of what failed and can then do a > reliable “resume” with very extra effort. > > I would also add the time it takes. Afaik you don’t want to run more than > two repairs at the same time. Depending on the load and number of nodes > it easily becomes a tedious task. > > My view might well be biased by running that old version on a less than > optimal cluster -improved only a couple of weeks ago, so I still have to > see how it translates to repairs. > > > > *Miguel A. Santos* > > *Senior Platform Engineer* > > > > *e* mlo...@ims.tech <http://www.ims.tech/> > *w* ims.tech <http://www.ims.tech/> > > *t *+1 226 339 8357 <http://www.ims.tech/> > > > [image: signatureImage] > > > ------------------------------ > > > > [image: Image] <https://twitter.com/IMSTechHQ> [image: Image] > <https://www.linkedin.com/company/imstechhq/> > > Trak (Global Solutions) Limited, trading as IMS, is a company registered > in England and Wales with company registration number 06944694 and > registered address at Global House, Westmere Drive, Crewe Business Park, > Crewe, Cheshire, CW1 6ZD. > > This email and any attachments to it may be confidential, may be legally > privileged and are intended solely for the use of the individual to whom it > is addressed. Any views or opinions expressed are solely those of the > author and do not necessarily represent those of the Trak Global Group. If > you are not the intended recipient of this email, you must not take any > action based upon its contents, nor copy or show it to anyone. Please > contact the sender if you believe you have received this email in error. > > ------------------------------ > *From:* Josh McKenzie <jmcken...@apache.org> > *Sent:* Tuesday, December 17, 2024 3:11:06 PM > *To:* user@cassandra.apache.org <user@cassandra.apache.org> > *Subject:* Re: Cassandra 5 Upgrade - Storage Compatibility Modes > > It's kind of a shame we don't have rolling restart functionality built in > to the database / sidecar. I know we've discussed that in the past. > > +1 to Jon's question - clients (i.e. java driver, etc) should be able to > handle disconnects gracefully and route to other coordinators leaving the > application-facing symptom being a blip on latency. Are you seeing > something else more painful, or is it more just not having the built-in > tooling / instrumentation to make it a clean reproducible operation? > > On Tue, Dec 17, 2024, at 2:24 PM, Jon Haddad wrote: > > Just curious, why is a rolling restart difficult? Is it a tooling issue, > stability, just overall fear of messing with things? > > You *should* be able to do a rolling restart without it being an issue. I > look at this as a fundamental workflow that every C* operator should have > available, and you should be able to do them without there being any > concern. > > Jon > > > On 2024/12/17 16:01:06 Paul Chandler wrote: > > All, > > > > We are getting a lot of push back on the 3 stage process of going > through the three compatibility modes to upgrade to Cassandra 5. This > basically means 3 rolling restarts of a cluster, which will be difficult > for some of our large multi DC clusters. > > > > Having researched this, it looks like, if you are not going to create > large TTL’s, it would be possible to go straight from C*4 to C*5 with SCM > NONE. This seems to be the same as it would have been going from 4.0 -> 4.1 > > > > Is there any reason why this should not be done? Has anyone had > experience of upgrading in this way? > > > > Thanks > > > > Paul Chandler > > > > > > >