Implementation details aside, I’m firmly in the “it would be nice of C* could take care of it” camp. Reaper is pretty damn easy to use and people *still* don’t put it in prod.
> On Apr 4, 2018, at 4:16 AM, Rahul Singh <rahul.xavier.si...@gmail.com> wrote: > > I understand the merits of both approaches. In working with other DBs In the > “old country” of SQL, we often had to write indexing sequences manually for > important tables. It was “built into the product” but in order to leverage > the maximum benefits of indices we had to have different indices other than > the clustered (physical index). The process still sucked. It’s never perfect. > > The JVM is already fraught with GC issues and putting another process being > managed in the same heapspace is what I’m worried about. Technically the > process could be in the same binary but started as a side Car or in the same > main process. > > Consider a process called “cassandra-agent” that’s sitting around with a > scheduler based on config or a Cassandra table. Distributed in the same > release. Shell / service scripts would start it. The end user knows it only > by examining the .sh files. This opens possibilities of including a GUI > hosted in the same process without cluttering the core coolness of Cassandra. > > Best, > > -- > Rahul Singh > rahul.si...@anant.us > > Anant Corporation > > On Apr 4, 2018, 2:50 AM -0400, Dor Laor <d...@scylladb.com>, wrote: >> We at Scylla, implemented repair in a similar way to the Cassandra reaper. >> We do >> that using an external application, written in go that manages repair for >> multiple clusters >> and saves the data in an external Scylla cluster. The logic resembles the >> reaper one with >> some specific internal sharding optimizations and uses the Scylla rest api. >> >> However, I have doubts it's the ideal way. After playing a bit with >> CockroachDB, I realized >> it's super nice to have a single binary that repairs itself, provides a GUI >> and is the core DB. >> >> Even while distributed, you can elect a leader node to manage the repair in >> a consistent >> way so the complexity can be reduced to a minimum. Repair can write its >> status to the >> system tables and to provide an api for progress, rate control, etc. >> >> The big advantage for repair to embedded in the core is that there is no >> need to expose >> internal state to the repair logic. So an external program doesn't need to >> deal with different >> version of Cassandra, different repair capabilities of the core (such as >> incremental on/off) >> and so forth. A good database should schedule its own repair, it knows >> whether the shreshold >> of hintedhandoff was cross or not, it knows whether nodes where replaced, >> etc, >> >> My 2 cents. Dor >> >> On Tue, Apr 3, 2018 at 11:13 PM, Dinesh Joshi < >> dinesh.jo...@yahoo.com.invalid> wrote: >> >>> Simon, >>> You could still do load aware repair outside of the main process by >>> reading Cassandra's metrics. >>> In general, I don't think the maintenance tasks necessarily need to live >>> in the main process. They could negatively impact the read / write path. >>> Unless strictly required by the serving path, it could live in a sidecar >>> process. There are multiple benefits including isolation, faster iteration, >>> loose coupling. For example - this would mean that the maintenance tasks >>> can have a different gc profile than the main process and it would be ok. >>> Today that is not the case. >>> The only issue I see is that the project does not provide an official >>> sidecar. Perhaps there should be one. We probably would've not had to have >>> this discussion ;) >>> Dinesh >>> >>> On Tuesday, April 3, 2018, 10:12:56 PM PDT, Qingcun Zhou < >>> zhouqing...@gmail.com> wrote: >>> >>> Repair has been a problem for us at Uber. In general I'm in favor of >>> including the scheduling logic in Cassandra daemon. It has the benefit of >>> introducing something like load-aware repair, eg, only schedule repair >>> while no ongoing compaction or traffic is low, etc. As proposed by others, >>> we can expose keyspace/table-level configurations so that users can opt-in. >>> Regarding the risk, yes there will be problems at the beginning but in the >>> long run, users will appreciate that repair works out of the box, just like >>> compaction. We have large Cassandra deployments and can work with Netflix >>> folks for intensive testing to boost user confidence. >>> >>> On the other hand, have we looked into how other NoSQL databases do repair? >>> Is there a side car process? >>> >>> >>> On Tue, Apr 3, 2018 at 9:21 PM, sankalp kohli <kohlisank...@gmail.com >>> wrote: >>> >>>> Repair is critical for running C* and I agree with Roopa that it needs to >>>> be part of the offering. I think we should make it easy for new users to >>>> run C*. >>>> >>>> Can we have a side car process which we can add to Apache Cassandra >>>> offering and we can put this repair their? I am also fine putting it in >>> C* >>>> if side car is more long term. >>>> >>>> On Tue, Apr 3, 2018 at 6:20 PM, Roopa Tangirala < >>>> rtangir...@netflix.com.invalid> wrote: >>>> >>>>> In seeing so many companies grapple with running repairs successfully >>> in >>>>> production, and seeing the success of distributed scheduled repair here >>>> at >>>>> Netflix, I strongly believe that adding this to Cassandra would be a >>>> great >>>>> addition to the database. I am hoping, we as a community will make it >>>> easy >>>>> for teams to operate and run Cassandra by enhancing the core product, >>> and >>>>> making the maintenances like repairs and compactions part of the >>> database >>>>> without external tooling. We can have an experimental flag for the >>>> feature >>>>> and only teams who are confident with the service can enable them, >>> while >>>>> others can fall back to default repairs. >>>>> >>>>> >>>>> *Regards,* >>>>> >>>>> *Roopa Tangirala* >>>>> >>>>> Engineering Manager CDE >>>>> >>>>> *(408) 438-3156 - mobile* >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, Apr 3, 2018 at 4:19 PM, Kenneth Brotman < >>>>> kenbrot...@yahoo.com.invalid> wrote: >>>>> >>>>>> Why not make it configurable? >>>>>> auto_manage_repair_consistancy: true (default: false) >>>>>> >>>>>> Then users can use the built in auto repair function that would be >>>>> created >>>>>> or continue to handle it as now. Default behavior would be "false" >>> so >>>>>> nothing changes on its own. Just wondering why not have that option? >>>> It >>>>>> might accelerate progress as others have already suggested. >>>>>> >>>>>> Kenneth Brotman >>>>>> >>>>>> -----Original Message----- >>>>>> From: Nate McCall [mailto:zznat...@gmail.com] >>>>>> Sent: Tuesday, April 03, 2018 1:37 PM >>>>>> To: dev >>>>>> Subject: Re: Repair scheduling tools >>>>>> >>>>>> This document does a really good job of listing out some of the >>> issues >>>> of >>>>>> coordinating scheduling repair. Regardless of which camp you fall >>> into, >>>>> it >>>>>> is certainly worth a read. >>>>>> >>>>>> On Wed, Apr 4, 2018 at 8:10 AM, Joseph Lynch <joe.e.ly...@gmail.com >>>>>> wrote: >>>>>>> I just want to say I think it would be great for our users if we >>>> moved >>>>>>> repair scheduling into Cassandra itself. The team here at Netflix >>> has >>>>>>> opened the ticket >>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-14346 >>>>>>> and have written a detailed design document >>>>>>> <https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_ >>>> t45rz7H3xs9G >>>>>>> bFSEyGzEtM/edit#heading=h.iasguic42ger >>>>>>> that includes problem discussion and prior art if anyone wants to >>>>>>> contribute to that. We tried to fairly discuss existing solutions, >>>>>>> what their drawbacks are, and a proposed solution. >>>>>>> >>>>>>> If we were to put this as part of the main Cassandra daemon, I >>> think >>>>>>> it should probably be marked experimental and of course be >>> something >>>>>>> that users opt into (table by table or cluster by cluster) with the >>>>>>> understanding that it might not fully work out of the box the first >>>>>>> time we ship it. We have to be willing to take risks but we also >>> have >>>>>>> to be honest with our users. It may help build confidence if a few >>>>>>> major deployments use it (such as Netflix) and we are happy of >>> course >>>>>>> to provide that QA as best we can. >>>>>>> >>>>>>> -Joey >>>>>>> >>>>>>> On Tue, Apr 3, 2018 at 10:48 AM, Blake Eggleston >>>>>>> <beggles...@apple.com >>>>>>> wrote: >>>>>>> >>>>>>>> Hi dev@, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> The question of the best way to schedule repairs came up on >>>>>>>> CASSANDRA-14346, and I thought it would be good to bring up the >>> idea >>>>>>>> of an external tool on the dev list. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Cassandra lacks any sort of tools for automating routine tasks >>> that >>>>>>>> are required for running clusters, specifically repair. Regular >>>>>>>> repair is a must for most clusters, like compaction. This means >>>> that, >>>>>>>> especially as far as eventual consistency is concerned, Cassandra >>>>>>>> isn’t totally functional out of the box. Operators either need to >>>>>>>> find a 3rd party solution or implement one themselves. Adding this >>>> to >>>>>>>> Cassandra would make it easier to use. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Is this something we should be doing? If so, what should it look >>>> like? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Personally, I feel like this is a pretty big gap in the project >>> and >>>>>>>> would like to see an out of process tool offered. Ideally, >>> Cassandra >>>>>>>> would just take care of itself, but writing a distributed repair >>>>>>>> scheduler that you trust to run in production is a lot harder than >>>>>>>> writing a single process management application that can failover. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Any thoughts on this? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Blake >>>>>>>> >>>>>>>> >>>>>> >>>>>> ------------------------------------------------------------ >>> --------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>> >>>>>> >>>>>> ------------------------------------------------------------ >>> --------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>> >>>>>> >>>>> >>>> >>> >>> >>> >>> -- >>> Thank you & Best Regards, >>> --Simon (Qingcun) Zhou >>> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org