I can share Orange’s view of the situation, sorry it is a long story! We started CassKop at the end of 2018 after betting on K8S which was not so simple as far as C* was concerned. Lack of support for local storage, IPs that change all the time, different network plugins to try to implement a non standard K8s way of having nodes see each other from different dcs… We hesitated with Mesos but could not have both and K8S was already tracting so much you could not not choose it.
Anyway, we looked around and did not see anyone with such requirements so we said: why not try it ourselves but on github so that we may give it back to the community. We have used C* for quite a few years with great success on production with massive load and perfect availability. We love C* @ Orange :) Thanks! So we started writing support for mono-dc cluster (CassKop) and added the multi dc support with MultiCassKop which is another operator included in the CassKop repo. For more details we tried to document our designs as much as possible here: https://orange-opensource.github.io/casskop/docs/1_concepts/3_design_principes#multi-site-management In the middle of last year we had some talks with Datastax about working together around their new management sidecar. Their position on open source was not clear at that time so we said please come back when you have decided to go open source with it. Which they did in the beginning of this year. But at that time I guess work had started on cass-operator so we kept our separate ways. Since the beginning of the years, we have been working with our OPS team to have it in production. It is not simple as the team has to learn K8S and trust a newborn operator. This takes time especially as our internal cluster has been tweaked for multi-tenancy with obscure options being set by our K8s team… We also developed with Instaclustr the Backup & Restore functionnality (we have new CRDs (Custom Resource Definition) for backup and restore and a reconcile loop that calls out Instaclustr sidecar for these operations). We now support multiple backups in parallel and can write to s3/ google or azur (but Stefan could give more details here if needed) During the SIG calls we mentioned our desire to donate CassKop once it satisfies our basics requirements (v1 coming just now but I said it too many times already) I am actually not sure Datastax mentioned their desire to donate cass-operator but we decided to compare the designs and the functionalities based on respective CRDs. The CRD is the interface with the user as it is where you describe the cluster that you want to have. These talks were very interesting and we found out that the CassKop team had made good choices most of the time but was may be too open. Indeed our intention was to give all the possibilities for our OPS team to work. This includes : - very open topology definition using any configuration of labels to map dcs / racks and nodes to labels on clusters (we have labels on dcs / rooms / rows and server racks so we can map C* racks to storage or network arrays internaly) - possibility to have multiple C* nodes on a single K8S host (because internal clouds are not really clouds, they have limited resources) - custom C* image selection, - custom bootstrap script that lets you configure C* as you want using ConfigMaps, - the ability to mount different volumes wherever they wanted, - the possibility to run any number of sidecars alongside C* for custom probes in our case This makes CassKop quite powerful and flexible. We made sure that all those options are not enabled by default so one can just pop a simple 3 node cluster quickly On the other hand cass-operator had an interesting way of configuring C* just inside the CRD using cass-config. This is simple and elegant so we are implementing it as well for the support of C* 4 Now for the future, there are 3 choices in my opinion: - start from scratch (or John’s repo) by cherry picking bits from all operators. This is possible but will take some time / effort to have something usable. And then it will be compared to cass-operator and CassKop. I don’t see Orange contributing too much here as we believe CassKop to be a much better starting point - choose cass-operator: it is not on offer right now so let’s see if it does. I think Orange could contribute some bits inherited from CassKop if it is agreed by the community. Not sure it would be enough for us to use it. - choose CassKop: we would be delighted to donate it and contribute with some committers (including the original author who now works for AWS). It would then become the community operator but there would be cass-operator alongside probably. But Cass-operator is made to make it easier for Datastax to manage customer clusters by imposing some configuration. It make sense for their needs, so may be 2 operators. We don’t know how backup/restore will be handled here with medusa being adapted to K8s Sorry again for being long but 2 years of work deserve some lines of text :) I just saw your message Patrick but this was written already so we gain a week. Franck On 24 Sep 2020, at 10:08, Benjamin Lerer <benjamin.le...@datastax.com<mailto:benjamin.le...@datastax.com>> wrote: I realise there are meeting logs, but getting a wider discourse with non-stakeholder input might help to build a community consensus? It doesn't seem like it can hurt at this point, anyway. +1 On Wed, Sep 23, 2020 at 9:21 PM Benedict Elliott Smith <bened...@apache.org<mailto:bened...@apache.org>> wrote: Perhaps it helps to widen the field of discussion to the dev list? It might help if each of the stakeholder organisations state their view on the situation, including why they would or would not support a given approach/operator, and what (preferably specific) circumstances might lead them to change their mind? I realise there are meeting logs, but getting a wider discourse with non-stakeholder input might help to build a community consensus? It doesn't seem like it can hurt at this point, anyway. On 23/09/2020, 17:13, "John Sanda" <john.sa...@gmail.com<mailto:john.sa...@gmail.com>> wrote: I want to point out that pretty much everything being discussed in this thread has been discussed at length during the SIG meetings. I think it is worth noting because we are pretty much still have the same conversation. On Wed, Sep 23, 2020 at 12:03 PM Benedict Elliott Smith < bened...@apache.org<mailto:bened...@apache.org>> wrote: I don't think there's anything about a code drop that's not "The Apache Way" If there's a consensus (or even strong majority) amongst invested parties, I don't see why we could not adopt an operator directly into the project. It's possible a green field approach might lead to fewer hard feelings, as everyone is in the same boat. Perhaps all operators are also suboptimal and could be improved with a rewrite? But I think coordinating a lot of different entities around an empty codebase is particularly challenging. I actually think it could be better for cohesion and collaboration to have a suboptimal but substantive starting point. On 23/09/2020, 16:11, "Stefan Miklosovic" < stefan.mikloso...@instaclustr.com<mailto:stefan.mikloso...@instaclustr.com>> wrote: I think that from Instaclustr it was stated quite clearly multiple times that we are "fine to throw it away" if there is something better and more wide-spread.Indeed, we have invested a lot of time in the operator but it was not useless at all, we gained a lot of quite unique knowledge how to put all pieces together. However, I think that this space is going to be quite fragmented and "balkanized", which is not always a bad thing, but in a quite narrow area as Kubernetes operator is, I just do not see how 4 operators are going to be beneficial for ordinary people ("official" from community, ours, Datastax one and CassKop (without any significant order)). Sure, innovation and healthy competition is important but to what extent ... One can start a Cassandra cluster on Kubernetes just so many times differently and nobody really likes a vendor lock-in. People wanting to run a cluster on K8S realise that there are three operators, each backed by a private business entity, and the community operator is not there ... Huh, interesting ... One may even start to question what is wrong with these folks that it takes three companies to build their own solution. Having said that, to my perception, Cassandra community just does not have enough engineers nor contributors to keep 4 operators alive at the same time (I wish I was wrong) so the idea of selecting the best one or to merge obvious things and approaches together is understandable, even if it meant we eventually sunset ours. In addition, nobody from big players is going to contribute to the code base of the other one, for obvious reasons, so channeling and directing this effort into something common for a community seems to be the only reasonable way of cooperation. It is quite hard to bootstrap this if the donation of the code in big chunks / whole repo is out of question as it is not the "Apache way" (there was some thread running here about this in more depth a while ago) and we basically need to start from scratch which is quite demotivating, we are just inventing the wheel and nobody is up to it. It is like people are waiting for that to happen so they can jump in "once it is the thing" but it will never materialise or at least the hurdle to kick it off is unnecessarily high. Nobody is going to invest in this heavily if there is already a working operator from companies mentioned above. As I understood it, one reason of not choosing the way of donating it all is that "the learning and community building should happen in organic manner and we just can not accept the donation", but is not it true that it is easier to build a community around something which is already there rather than trying to build it around an idea which is quite hard to dedicate to? On Wed, 23 Sep 2020 at 15:28, Joshua McKenzie < jmcken...@apache.org<mailto:jmcken...@apache.org>> wrote: I think there's significant value to the community in trying to coalesce on a single approach, I agree. Unfortunately in this case, the parties with a vested interest and written operators came to the table and couldn't agree to coalesce on a single approach. John Sanda attempted to start an initiative to write a best-of-breed combining choice parts of each operator, but that effort did not gain traction. Which is where my hypothesis comes from that if there were a clear "better fit" operator to start from we wouldn't be in a deadlock; the correct choice would be obvious. Reasonably so, every engineer that's written something is going to want that something to be used and not thrown away in favor of another something without strong evidence as to why that's the better choice. As far as I know, nobody has made a clear case as to a more compelling place to start in terms of an operator donation the project then collaborates on. There's no mass adoption evidence nor feature enumeration that I know of for any of the approaches anyone's taken, so the discussions remain stalled. On Wed, Sep 23, 2020 at 7:18 AM, Benedict Elliott Smith < bened...@apache.org<mailto:bened...@apache.org> wrote: I think there's significant value to the community in trying to coalesce on a single approach, earlier than later. This is an opportunity to expand the number of active organisations involved directly in the Apache Cassandra project, as well as to more quickly expand the project's functionality into an area we consider urgent and important. I think it would be a real shame to waste this opportunity. No doubt it will be hard, as organisations have certain built-in investments in their own approaches. I haven't participated in these calls as I do not consider myself to have the relevant experience and expertise, and have other focuses on the project. I just wanted to voice a vote in favour of trying to bring the different organisations together on a single approach if possible. Is there anything the project can do to help this happen? On 23/09/2020, 03:04, "Ben Bromhead" <b...@instaclustr.com<mailto:b...@instaclustr.com>> wrote: I think there is certainly an appetite to donate and standardise on a given operator (as mentioned in this thread). I personally found the SIG hard to participate in due to time zones and the synchronous nature of it. So while it was a great forum to dive into certain details for a subset of participants and a worthwhile endeavour, I wouldn't paint it as an accurate reflection of community intent. I don't think that any participants want to continue down the path of "let a thousand flowers bloom". That's why we are looking towards CasKop (as well as a number of technical reasons). Some of the recorded meetings and outputs can also be found if you are interested in some primary sources https://cwiki.apache.org/confluence/display/CASSANDRA/ Cassandra+Kubernetes+Operator+SIG . From what I understand second-hand from talking to people on the SIG calls, there was a general inability to agree on an existing operator as a starting point and not much engagement on taking best of breed from the various to combine them. Seems to leave us in the "let a thousand flowers bloom" stage of letting operators grow in the ecosystem and seeing which ones meet the needs of end users before talking about adopting one into the foundation. Great to hear that you folks are joining forces though! Bodes well for C* users that are wanting to run things on k8s. On Tue, Sep 22, 2020 at 4:26 AM, Ben Bromhead < b...@instaclustr.com<mailto:b...@instaclustr.com> wrote: For what it's worth, a quick update from me: CassKop now has at least two organisations working on it substantially (Orange and Instaclustr) as well as the numerous other contributors. Internally we will also start pointing others towards CasKop once a few things get merged. While we are not yet sunsetting our operator yet, it is certainly looking that way. I'd love to see the community adopt it as a starting point for working towards whatever level of functionality is desired. Cheers Ben On Fri, Sep 11, 2020 at 2:37 PM John Sanda < john.sa...@gmail.com> wrote: On Thu, Sep 10, 2020 at 5:27 PM Josh McKenzie < jmcken...@apache.org> wrote: There's basically 1 java driver in the C* ecosystem. We have 3? 4? or more operators in the ecosystem. Has one of them hit a clear supermajority of adoption that makes it the de facto default and makes sense to pull it into the project? We as a project community were pretty slow to move on building a PoV around kubernetes so we find ourselves in a situation with a bunch of contenders for inclusion in the project. It's not clear to me what heuristics we'd use to gauge which one would be the best fit for inclusion outside letting community adoption speak. --- Josh McKenzie We actually talked a good bit on the SIG call earlier today about heuristics. We need to document what functionality an operator should include at level 0, level 1, etc. We did discuss this a good bit during some of the initial SIG meetings, but I guess it wasn't really a focal point at the time. I think we should also provide references to existing operator projects and possibly other related projects. This would benefit both community users as well as people working on these projects. - John -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr <http://twitter.com/instaclustr> | (650) 284 9692 -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr <http://twitter.com/instaclustr> | (650) 284 9692 --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org -- - John --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you.