I can agree with that Ben. Franck did a good job of outlining CassKop. Somebody from the cass-operator will be posting something similar and we can keep it on the mailing list.
Patrick On Sun, Sep 27, 2020 at 2:16 PM Ben Bromhead <b...@instaclustr.com> wrote: > Thanks Frank and Stefan. > > @Patrick great suggestion and worthwhile getting everything on the table. > > One minor change I would advocate for. The SIG has been great to iterate > and interact on the details, but I really think this conversation given the > nature of the content needs to be on the mailing list. The mailing list is > really our system of record and the most accessible. > > It gives folk time to think and digest, it's asynchronous, easily > searchable and let's be honest, the majority of stakeholders in this are > not US based, so the timing issue then goes away and makes it easier for > people to participate in. I feel like we've made a lot more progress by > simply having this discussion here. > > So instead of a presentation, maybe just an email to the ML addressing the > headings that Patrick identified? > > > On Fri, Sep 25, 2020 at 7:55 AM Stefan Miklosovic < > stefan.mikloso...@instaclustr.com> wrote: > > > Hi, > > > > Patrick's suggestion seems good to me. > > > > I won't go into specifics here as I need to genuinely prepare for > > this. It is quite hard to dig deep into the solutions of others and > > bring some constructive criticism because it takes a lot of time to > > study it and everybody has some "why's" behind it. > > > > To summarize my goals and concerns: > > > > 1) We should be as much "Kubernetes operator idiomatic" as possible. > > Industry standards, no custom brain-child of this or that group > > because they think it is just cool or they just didn't know any > > better. I do NOT say it is like that right now, I just want to be > > ruthless here as much as possible when it comes to functionality and > > why it is done like that. It is awesome that we have already something > > latest (thanks to John) and it adheres to the latest releases. I > > personally had a hard time to keep up with all the releases, once I > > finished something and I aligned it, after a week or two there was > > already another one where things were different, it is a very > > fast-moving space and I hope that by time we develop something it will > > not be obsolete. > > > > 2) It may be easier said than done but it is guaranteed that people > > get emotional, it's their precious etc, so please let's go into this > > with good intentions, not trying to push one solution over the other > > just because they would like to see it there ... I will have an > > equally hard time to comply with this point. My plan is to explain > > what is _wrong_ with our solution. Where we made mistakes and what > > should be done differently but it is "too late" etc. It is quite hard > > to describe your work and all effort in this light but without telling > > what is wrong we can not decide what is good imho. > > > > 3) We should put something together fast enough so we can call it a > > release. We can always iterate on it for eternity. But the foundations > > need to be there. Here I want to say that I especially like what John > > did. I looked through these specs and it was obvious it has been > > written with care and attention. It looked _solid_. I am not sure how > > hard it is to put all other things on top of that, I truly do not, and > > here I think we would have to reinvent that wheel if we want to > > proceed because I can not imagine what it would be to retrofit e.g. > > CassKop on top of John specs, it is just like putting round pegs into > > the square holes, maybe some chunks would be reused easily but > > otherwise I worry we will be just on square one. > > > > One specific feeling I have as I read this is that even if there is > > the will to create the fourth operator, the respective parties will > > not be able to drop their own repository. The whole point behind this > > effort, to me, is to have a solid, community driven, stable, modern > > and feature complete operator people are truly using. I can see that > > once this is real, we will _really_ sunset our operator, redirecting > > people to the new operator on main readme doc etc, we truly mean it. > > Sure, if somebody comes and bug fix will be needed, we will fix it, > > but the whole point of doing this is to stop using what we have > > currently, over time, otherwise we are just splitting this space even > > more. If CassKop is not sure if they will use it because they do not > > know if that operator will be "enough" for them, aren't we just doing > > it wrong? If I exaggerate, they should be fine with deleting the whole > > repository and using just this Cassandra one we are going to make > > otherwise I don't see the point to work on this ... > > > > On Thu, 24 Sep 2020 at 20:45, Joshua McKenzie <jmcken...@apache.org> > > wrote: > > > > > > - choose cass-operator: it is not on offer right now so let’s see if it > > does > > > > > > > > > We should all talk a lot more, but this is 100% a mistake - I take the > > > blame for that. The intention has long been to offer cass-operator for > > > donation but it slipped through the cracks and your email yesterday > made > > me > > > double-take. > > > > > > We have since resolved this misalignment. DataStax would be happy to > > donate > > > any and all of cass-operator to the ASF and C* project if it's what we > > all > > > agree best serves our collective Cassandra users. I'm also cognizant > that > > > an immense amount of effort has gone into CassKop and we seem to have > > > something of an embarrassment of riches. > > > > > > I'm given to understand (haven't dug in personally) that the two > > operators > > > express pretty different opinions when it comes to frameworks, designs, > > > supported versions, etc. I think a discrete enumeration of the feature > > set > > > and "identities" of both could really help navigate this conversation > > going > > > forward. > > > > > > Also - thanks for that context Franck. It's always helpful to know > where > > > other people are coming from when we're all working together towards a > > > common goal. > > > > > > > > > On Thu, Sep 24, 2020 at 12:23 PM, <franck.de...@orange.com> wrote: > > > > > > > I can share Orange’s view of the situation, sorry it is a long story! > > > > > > > > We started CassKop at the end of 2018 after betting on K8S which was > > not > > > > so simple as far as C* was concerned. Lack of support for local > > storage, > > > > IPs that change all the time, different network plugins to try to > > implement > > > > a non standard K8s way of having nodes see each other from different > > dcs… > > > > We hesitated with Mesos but could not have both and K8S was already > > > > tracting so much you could not not choose it. > > > > > > > > Anyway, we looked around and did not see anyone with such > requirements > > so > > > > we said: why not try it ourselves but on github so that we may give > it > > back > > > > to the community. We have used C* for quite a few years with great > > success > > > > on production with massive load and perfect availability. We love C* > @ > > > > Orange :) Thanks! > > > > > > > > So we started writing support for mono-dc cluster (CassKop) and added > > the > > > > multi dc support with MultiCassKop which is another operator included > > in > > > > the CassKop repo. For more details we tried to document our designs > as > > much > > > > as possible here: https://orange-opensource.github.io/casskop/docs/ > > > > 1_concepts/3_design_principes#multi-site-management > > > > > > > > In the middle of last year we had some talks with Datastax about > > working > > > > together around their new management sidecar. Their position on open > > source > > > > was not clear at that time so we said please come back when you have > > > > decided to go open source with it. Which they did in the beginning of > > this > > > > year. But at that time I guess work had started on cass-operator so > we > > kept > > > > our separate ways. > > > > > > > > Since the beginning of the years, we have been working with our OPS > > team > > > > to have it in production. It is not simple as the team has to learn > > K8S and > > > > trust a newborn operator. This takes time especially as our internal > > > > cluster has been tweaked for multi-tenancy with obscure options being > > set > > > > by our K8s team… > > > > > > > > We also developed with Instaclustr the Backup & Restore > functionnality > > (we > > > > have new CRDs (Custom Resource Definition) for backup and restore > and a > > > > reconcile loop that calls out Instaclustr sidecar for these > > operations). We > > > > now support multiple backups in parallel and can write to s3/ google > or > > > > azur (but Stefan could give more details here if needed) > > > > > > > > During the SIG calls we mentioned our desire to donate CassKop once > it > > > > satisfies our basics requirements (v1 coming just now but I said it > too > > > > many times already) I am actually not sure Datastax mentioned their > > desire > > > > to donate cass-operator but we decided to compare the designs and the > > > > functionalities based on respective CRDs. The CRD is the interface > > with the > > > > user as it is where you describe the cluster that you want to have. > > These > > > > talks were very interesting and we found out that the CassKop team > had > > made > > > > good choices most of the time but was may be too open. Indeed our > > intention > > > > was to give all the possibilities for our OPS team to work. This > > includes : > > > > - very open topology definition using any configuration of labels to > > map > > > > dcs / racks and nodes to labels on clusters (we have labels on dcs / > > rooms > > > > / rows and server racks so we can map C* racks to storage or network > > arrays > > > > internaly) > > > > - possibility to have multiple C* nodes on a single K8S host (because > > > > internal clouds are not really clouds, they have limited resources) > > > > - custom C* image selection, > > > > - custom bootstrap script that lets you configure C* as you want > using > > > > ConfigMaps, > > > > - the ability to mount different volumes wherever they wanted, > > > > - the possibility to run any number of sidecars alongside C* for > custom > > > > probes in our case > > > > > > > > This makes CassKop quite powerful and flexible. > > > > We made sure that all those options are not enabled by default so one > > can > > > > just pop a simple 3 node cluster quickly > > > > > > > > On the other hand cass-operator had an interesting way of configuring > > C* > > > > just inside the CRD using cass-config. This is simple and elegant so > > we are > > > > implementing it as well for the support of C* 4 > > > > > > > > Now for the future, there are 3 choices in my opinion: > > > > - start from scratch (or John’s repo) by cherry picking bits from all > > > > operators. This is possible but will take some time / effort to have > > > > something usable. And then it will be compared to cass-operator and > > > > CassKop. I don’t see Orange contributing too much here as we believe > > > > CassKop to be a much better starting point > > > > - choose cass-operator: it is not on offer right now so let’s see if > it > > > > does. I think Orange could contribute some bits inherited from > CassKop > > if > > > > it is agreed by the community. Not sure it would be enough for us to > > use > > > > it. > > > > - choose CassKop: we would be delighted to donate it and contribute > > with > > > > some committers (including the original author who now works for > AWS). > > It > > > > would then become the community operator but there would be > > cass-operator > > > > alongside probably. But Cass-operator is made to make it easier for > > > > Datastax to manage customer clusters by imposing some configuration. > It > > > > make sense for their needs, so may be 2 operators. We don’t know how > > > > backup/restore will be handled here with medusa being adapted to K8s > > > > > > > > Sorry again for being long but 2 years of work deserve some lines of > > text > > > > :) > > > > > > > > I just saw your message Patrick but this was written already so we > > gain a > > > > week. > > > > > > > > Franck > > > > > > > > On 24 Sep 2020, at 10:08, Benjamin Lerer < > benjamin.le...@datastax.com > > > > <mailto:benjamin.le...@datastax.com>> wrote: > > > > > > > > I realise there are meeting logs, but getting a wider discourse with > > > > non-stakeholder input might help to build a community consensus? It > > doesn't > > > > seem like it can hurt at this point, anyway. > > > > > > > > +1 > > > > > > > > On Wed, Sep 23, 2020 at 9:21 PM Benedict Elliott Smith > > <benedict@apache. > > > > org<mailto:bened...@apache.org>> wrote: > > > > > > > > Perhaps it helps to widen the field of discussion to the dev list? > > > > > > > > It might help if each of the stakeholder organisations state their > > view on > > > > the situation, including why they would or would not support a given > > > > approach/operator, and what (preferably specific) circumstances might > > lead > > > > them to change their mind? > > > > > > > > I realise there are meeting logs, but getting a wider discourse with > > > > non-stakeholder input might help to build a community consensus? It > > doesn't > > > > seem like it can hurt at this point, anyway. > > > > > > > > On 23/09/2020, 17:13, "John Sanda" <john.sa...@gmail.com<mailto: > john. > > > > sa...@gmail.com>> wrote: > > > > > > > > I want to point out that pretty much everything being discussed in > this > > > > thread has been discussed at length during the SIG meetings. I think > > it is > > > > worth noting because we are pretty much still have the same > > conversation. > > > > > > > > On Wed, Sep 23, 2020 at 12:03 PM Benedict Elliott Smith < > > benedict@apache. > > > > org<mailto:bened...@apache.org>> wrote: > > > > > > > > I don't think there's anything about a code drop that's not "The > Apache > > > > Way" > > > > > > > > If there's a consensus (or even strong majority) amongst invested > > parties, > > > > I don't see why we could not adopt an operator directly into the > > project. > > > > > > > > It's possible a green field approach might lead to fewer hard > > feelings, as > > > > everyone is in the same boat. Perhaps all operators are also > suboptimal > > > > and > > > > could be improved with a rewrite? But I think coordinating a lot of > > > > different entities around an empty codebase is particularly > > challenging. I > > > > actually think it could be better for cohesion and collaboration to > > have a > > > > suboptimal but substantive starting point. > > > > > > > > On 23/09/2020, 16:11, "Stefan Miklosovic" < stefan.miklosovic@ > > > > instaclustr.com<mailto:stefan.mikloso...@instaclustr.com>> wrote: > > > > > > > > I think that from Instaclustr it was stated quite clearly multiple > > > > times that we are "fine to throw it away" if there is something > better > > > > and more wide-spread.Indeed, we have invested a lot of time in the > > > > operator but it was not useless at all, we gained a lot of quite > unique > > > > knowledge how to put all pieces together. However, I think that > > > > this space is going to be quite fragmented and "balkanized", which is > > > > not always a bad thing, but in a quite narrow area as Kubernetes > > operator > > > > is, I just do not see how 4 operators are going to be beneficial for > > > > ordinary people ("official" from community, ours, Datastax one and > > CassKop > > > > (without any significant order)). Sure, innovation and healthy > > competition > > > > is important but to what extent ... > > > > One can start a Cassandra cluster on Kubernetes just so many times > > > > differently and nobody really likes a vendor lock-in. People wanting > > > > to run a cluster on K8S realise that there are three operators, each > > > > backed by a private business entity, and the community operator is > not > > > > there ... Huh, interesting ... One may even start to question what is > > > > wrong with these folks that it takes three companies to build their > > > > own solution. > > > > > > > > Having said that, to my perception, Cassandra community just does not > > > > have enough engineers nor contributors to keep 4 operators alive at > > > > the same time (I wish I was wrong) so the idea of selecting the best > > > > one or to merge obvious things and approaches together is > > understandable, > > > > even if it meant we eventually sunset ours. In addition, nobody from > > big > > > > players is going to contribute to the code > > > > base of the other one, for obvious reasons, so channeling and > directing > > > > this effort into something common for a community seems to > > > > be the only reasonable way of cooperation. > > > > > > > > It is quite hard to bootstrap this if the donation of the code in big > > > > chunks / whole repo is out of question as it is not the "Apache way" > > > > (there was some thread running here about this in more depth a while > > > > ago) and we basically need to start from scratch which is quite > > > > demotivating, we are just inventing the wheel and nobody is up to it. > > > > It is like people are waiting for that to happen so they can jump in > > > > "once it is the thing" but it will never materialise or at least the > > > > hurdle to kick it off is unnecessarily high. Nobody is going to > invest > > > > in this heavily if there is already a working operator from companies > > > > mentioned above. As I understood it, one reason of not choosing the > > > > way of donating it all is that "the learning and community building > > > > should happen in organic manner and we just can not accept the > > donation", > > > > but is not it true that it is easier to build a community > > > > around something which is already there rather than trying to build > it > > > > around an idea which is quite hard to dedicate to? > > > > > > > > On Wed, 23 Sep 2020 at 15:28, Joshua McKenzie < jmcken...@apache.org > > > > <mailto:jmcken...@apache.org>> wrote: > > > > > > > > I think there's significant value to the community in trying to > > > > coalesce > > > > on a single approach, > > > > I agree. Unfortunately in this case, the parties with a vested > interest > > > > and > > > > written operators came to the table and couldn't agree to coalesce > > > > on a > > > > single approach. John Sanda attempted to start an initiative to > write a > > > > best-of-breed combining choice parts of each operator, but that > effort > > did > > > > not gain traction. > > > > > > > > Which is where my hypothesis comes from that if there were a clear > > > > "better > > > > fit" operator to start from we wouldn't be in a deadlock; the correct > > > > choice would be obvious. Reasonably so, every engineer that's written > > > > something is going to want that something to be used and not thrown > > > > away in > > > > favor of another something without strong evidence as to why that's > > > > the > > > > better choice. > > > > > > > > As far as I know, nobody has made a clear case as to a more > compelling > > > > place to start in terms of an operator donation the project then > > > > collaborates on. There's no mass adoption evidence nor feature > > enumeration > > > > that I know of for any of the approaches anyone's taken, so the > > > > discussions > > > > remain stalled. > > > > > > > > On Wed, Sep 23, 2020 at 7:18 AM, Benedict Elliott Smith < > > benedict@apache. > > > > org<mailto:bened...@apache.org> wrote: > > > > > > > > I think there's significant value to the community in trying to > > > > coalesce > > > > on a single approach, earlier than later. This is an opportunity > > > > to expand > > > > the number of active organisations involved directly in the Apache > > > > Cassandra project, as well as to more quickly expand the project's > > > > functionality into an area we consider urgent and important. I > > > > think it > > > > would be a real shame to waste this opportunity. No doubt it will > > > > be hard, > > > > as organisations have certain built-in investments in their own > > > > approaches. > > > > > > > > I haven't participated in these calls as I do not consider myself > > > > to have > > > > the relevant experience and expertise, and have other focuses on > > > > the > > > > project. I just wanted to voice a vote in favour of trying to bring > the > > > > different organisations together on a single approach if possible. > > > > Is there > > > > anything the project can do to help this happen? > > > > > > > > On 23/09/2020, 03:04, "Ben Bromhead" <b...@instaclustr.com<mailto: > ben@ > > > > instaclustr.com>> wrote: > > > > > > > > I think there is certainly an appetite to donate and standardise > > > > on a > > > > given operator (as mentioned in this thread). > > > > > > > > I personally found the SIG hard to participate in due to time zones > and > > > > the synchronous nature of it. > > > > > > > > So while it was a great forum to dive into certain details for a > > > > subset of > > > > participants and a worthwhile endeavour, I wouldn't paint it as an > > > > accurate > > > > reflection of community intent. > > > > > > > > I don't think that any participants want to continue down the path > > > > of "let > > > > a thousand flowers bloom". That's why we are looking towards CasKop > (as > > > > well as a number of technical reasons). > > > > > > > > Some of the recorded meetings and outputs can also be found if you > > > > are > > > > interested in some primary sources > > > > https://cwiki.apache.org/confluence/display/CASSANDRA/ > > > > Cassandra+Kubernetes+Operator+SIG > > > > . > > > > > > > > From what I understand second-hand from talking to people on the > > > > SIG > > > > calls, > > > > > > > > there was a general inability to agree on an existing operator as a > > > > starting point and not much engagement on taking best of breed > > > > from the > > > > various to combine them. Seems to leave us in the "let a thousand > > > > flowers > > > > bloom" stage of letting operators grow in the ecosystem and seeing > > > > which > > > > ones meet the needs of end users before talking about adopting one > > > > into the > > > > foundation. > > > > > > > > Great to hear that you folks are joining forces though! Bodes well > > > > for C* > > > > users that are wanting to run things on k8s. > > > > > > > > On Tue, Sep 22, 2020 at 4:26 AM, Ben Bromhead < b...@instaclustr.com > > > > <mailto:b...@instaclustr.com> > > > > > > > > wrote: > > > > > > > > For what it's worth, a quick update from me: > > > > > > > > CassKop now has at least two organisations working on it > substantially > > > > (Orange and Instaclustr) as well as the numerous other contributors. > > > > > > > > Internally we will also start pointing others towards CasKop once > > > > a few > > > > things get merged. While we are not yet sunsetting our operator > > > > yet, it > > > > > > > > is > > > > > > > > certainly looking that way. > > > > > > > > I'd love to see the community adopt it as a starting point for > > > > working > > > > towards whatever level of functionality is desired. > > > > > > > > Cheers > > > > > > > > Ben > > > > > > > > On Fri, Sep 11, 2020 at 2:37 PM John Sanda < > > > > john.sa...@gmail.com> > > > > wrote: > > > > > > > > On Thu, Sep 10, 2020 at 5:27 PM Josh McKenzie < jmcken...@apache.org > > > > > > wrote: > > > > > > > > There's basically 1 java driver in the C* ecosystem. We have 3? 4? > > > > or > > > > > > > > more > > > > > > > > operators in the ecosystem. Has one of them hit a clear supermajority > > of > > > > adoption that makes it the de facto default and makes sense to > > > > pull it > > > > > > > > into > > > > > > > > the project? > > > > > > > > We as a project community were pretty slow to move on building a > > > > PoV > > > > > > > > around > > > > > > > > kubernetes so we find ourselves in a situation with a bunch of > > > > contenders > > > > for inclusion in the project. It's not clear to me what heuristics > > > > we'd > > > > > > > > use > > > > > > > > to gauge which one would be the best fit for inclusion outside > > > > letting > > > > community adoption speak. > > > > > > > > --- > > > > Josh McKenzie > > > > > > > > We actually talked a good bit on the SIG call earlier today about > > > > heuristics. We need to document what functionality an operator > > > > should > > > > include at level 0, level 1, etc. We did discuss this a good bit > > > > during > > > > some of the initial SIG meetings, but I guess it wasn't really a > > > > focal > > > > point at the time. I think we should also provide references to > > > > existing > > > > operator projects and possibly other related projects. This would > > > > benefit > > > > both community users as well as people working on these projects. > > > > > > > > - John > > > > > > > > -- > > > > > > > > Ben Bromhead > > > > > > > > Instaclustr | www.instaclustr.com | @instaclustr > > > > <http://twitter.com/instaclustr> | (650) 284 9692 > > > > > > > > -- > > > > > > > > Ben Bromhead > > > > > > > > Instaclustr | www.instaclustr.com | @instaclustr > > > > <http://twitter.com/instaclustr> | (650) 284 9692 > > > > > > > > --------------------------------------------------------------------- > > To > > > > unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For > > > > additional > > > > commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > > --------------------------------------------------------------------- > > To > > > > unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For > > additional > > > > commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > > --------------------------------------------------------------------- > > To > > > > unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For > > additional > > > > commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > > -- > > > > > > > > - John > > > > > > > > --------------------------------------------------------------------- > > To > > > > unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For > > additional > > > > commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > > > > > _________________________________________________________________________________________________________________________ > > > > > > > > > > > > Ce message et ses pieces jointes peuvent contenir des informations > > > > confidentielles ou privilegiees et ne doivent donc pas etre diffuses, > > > > exploites ou copies sans autorisation. Si vous avez recu ce message > par > > > > erreur, veuillez le signaler a l'expediteur et le detruire ainsi que > > les > > > > pieces jointes. Les messages electroniques etant susceptibles > > d'alteration, > > > > Orange decline toute responsabilite si ce message a ete altere, > > deforme ou > > > > falsifie. Merci. > > > > > > > > This message and its attachments may contain confidential or > privileged > > > > information that may be protected by law; they should not be > > distributed, > > > > used or copied without authorisation. If you have received this email > > in > > > > error, please notify the sender and delete this message and its > > > > attachments. As emails may be altered, Orange is not liable for > > messages > > > > that have been modified, changed or falsified. Thank you. > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > -- > > Ben Bromhead > > Instaclustr | www.instaclustr.com | @instaclustr > <http://twitter.com/instaclustr> | (650) 284 9692 >