Is one of the “abuse” of Apache license is ScyllaDB which is using Cassandra but not contributing back? Happy to be proved wrong as I am not a lawyer and don’t understand various licenses ..
> On Apr 23, 2018, at 16:55, Dor Laor <d...@scylladb.com> wrote: > >> On Mon, Apr 23, 2018 at 4:13 PM, Jonathan Haddad <j...@jonhaddad.com> wrote: >> >> From where I stand it looks like you've got only two options for any >> feature that involves updating the protocol: >> >> 1. Don't built the feature >> 2. Built it in Cassanda & scylladb, update the drivers accordingly >> >> I don't think you have a third option, which is built it only in ScyllaDB, >> because that means you have to fork *all* the drivers and make it work, >> then maintain them. Your business model appears to be built on not doing >> any of the driver work yourself, and you certainly aren't giving back to >> the open source community via a permissive license on ScyllaDB itself, so >> I'm a bit lost here. >> > > It's totally not about business model. > Scylla itself is 99% open source with AGPL license that prevents abuse and > forces to be committed back to the project. We also have our core engine > (seastar) licensed > as Apache since it needs to be integrated with the core application. > Recently one of our community members even created a new Seastar based, C++ > driver. > > Scylla chose to be compatible with the drivers in order to leverage the > existing infrastructure > and (let's be frank) in order to allow smooth migration. > We would have loved to contribute more to the drivers but up to recently we: > 1. Were busy on top of our heads with the server > 2. Happy w/ the existing drivers > 3. Developed extensions - GoCQLX - our own contribution > > Finally we can contribute back to the same driver project, we want to do it > the right way, > without forking and without duplicated efforts. > > Many times, having a private fork is way easier than proper open source > work so from > a pure business perspective, we don't select the shortest path. > > >> >> To me it looks like you're asking a bunch of volunteers that work on >> Cassandra to accommodate you. What exactly do we get out of this >> relationship? What incentive do I or anyone else have to spend time >> helping you instead of working on something that interests me? >> > > Jon, this is certainty not the case. > We genuinely wish to make true *open source* work on: > a. Cassandra drivers > b. Client protocol > c. Scylla server side. > d. Cassandra community related work: mailing list, Jira, design > > But not > e. Cassandra server side > > While I wouldn't mind doing the Cassandra server work, we don't have the > resources or > the expertise. The Cassandra _developer_ community is welcome to decide > whether > we get to contribute a/b/c/d. Avi has enumerated the options of > cooperation, passive cooperation > and zero cooperation (below). > > 1. The protocol change is developed using the Cassandra process in a JIRA > ticket, culminating in a patch to doc/native_protocol*.spec when consensus > is achieved. > 2. The protocol change is developed outside the Cassandra process. > 3. No cooperation. > > Look, I can understand the hostility and suspicious, however, from the C* > project POV, it makes no > sense to ignore, otherwise we'll fork the drivers and you won't get > anything back. There is another > at least one vendor today with their server fork and driver fork and it > makes sense to keep the protocol > unified in an extensible way and to discuss new features _together_. > > > >> >> Jon >> >> >> On Mon, Apr 23, 2018 at 7:59 AM Ben Bromhead <b...@instaclustr.com> wrote: >> >>>>>> This doesn't work without additional changes, for RF>1. The token >> ring >>>> could place two replicas of the same token range on the same physical >>>> server, even though those are two separate cores of the same server. >> You >>>> could add another element to the hierarchy (cluster -> datacenter -> >> rack >>>> -> node -> core/shard), but that generates unneeded range movements >> when >>> a >>>> node is added. >>>>> I have seen rack awareness used/abused to solve this. >>>>> >>>> >>>> But then you lose real rack awareness. It's fine for a quick hack, but >>>> not a long-term solution. >>>> >>>> (it also creates a lot more tokens, something nobody needs) >>>> >>> >>> I'm having trouble understanding how you loose "real" rack awareness, as >>> these shards are in the same rack anyway, because the address and port >> are >>> on the same server in the same rack. So it behaves as expected. Could you >>> explain a situation where the shards on a single server would be in >>> different racks (or fault domains)? >>> >>> If you wanted to support a situation where you have a single rack per DC >>> for simple deployments, extending NetworkTopologyStrategy to behave the >> way >>> it did before https://issues.apache.org/jira/browse/CASSANDRA-7544 with >>> respect to treating InetAddresses as servers rather than the address and >>> port would be simple. Both this implementation in Apache Cassandra and >> the >>> respective load balancing classes in the drivers are explicitly designed >> to >>> be pluggable so that would be an easier integration point for you. >>> >>> I'm not sure how it creates more tokens? If a server normally owns 256 >>> tokens, each shard on a different port would just advertise ownership of >>> 256/# of cores (e.g. 4 tokens if you had 64 cores). >>> >>> >>>> >>>>> Regards, >>>>> Ariel >>>>> >>>>>> On Apr 22, 2018, at 8:26 AM, Avi Kivity <a...@scylladb.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>>> On 2018-04-19 21:15, Ben Bromhead wrote: >>>>>>> Re #3: >>>>>>> >>>>>>> Yup I was thinking each shard/port would appear as a discrete >> server >>>> to the >>>>>>> client. >>>>>> This doesn't work without additional changes, for RF>1. The token >> ring >>>> could place two replicas of the same token range on the same physical >>>> server, even though those are two separate cores of the same server. >> You >>>> could add another element to the hierarchy (cluster -> datacenter -> >> rack >>>> -> node -> core/shard), but that generates unneeded range movements >> when >>> a >>>> node is added. >>>>>> >>>>>>> If the per port suggestion is unacceptable due to hardware >>>> requirements, >>>>>>> remembering that Cassandra is built with the concept scaling >>>> *commodity* >>>>>>> hardware horizontally, you'll have to spend your time and energy >>>> convincing >>>>>>> the community to support a protocol feature it has no (current) use >>>> for or >>>>>>> find another interim solution. >>>>>> Those servers are commodity servers (not x86, but still commodity). >> In >>>> any case 60+ logical cores are common now (hello AWS i3.16xlarge or >> even >>>> i3.metal), and we can only expect logical core count to continue to >>>> increase (there are 48-core ARM processors now). >>>>>> >>>>>>> Another way, would be to build support and consensus around a clear >>>>>>> technical need in the Apache Cassandra project as it stands today. >>>>>>> >>>>>>> One way to build community support might be to contribute an Apache >>>>>>> licensed thread per core implementation in Java that matches the >>>> protocol >>>>>>> change and shard concept you are looking for ;P >>>>>> I doubt I'll survive the egregious top-posting that is going on in >>> this >>>> list. >>>>>> >>>>>>> >>>>>>>> On Thu, Apr 19, 2018 at 1:43 PM Ariel Weisberg <ar...@weisberg.ws >>> >>>> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> So at technical level I don't understand this yet. >>>>>>>> >>>>>>>> So you have a database consisting of single threaded shards and a >>>> socket >>>>>>>> for accept that is generating TCP connections and in advance you >>>> don't know >>>>>>>> which connection is going to send messages to which shard. >>>>>>>> >>>>>>>> What is the mechanism by which you get the packets for a given TCP >>>>>>>> connection delivered to a specific core? I know that a given TCP >>>> connection >>>>>>>> will normally have all of its packets delivered to the same queue >>>> from the >>>>>>>> NIC because the tuple of source address + port and destination >>>> address + >>>>>>>> port is typically hashed to pick one of the queues the NIC >>> presents. I >>>>>>>> might have the contents of the tuple slightly wrong, but it always >>>> includes >>>>>>>> a component you don't get to control. >>>>>>>> >>>>>>>> Since it's hashing how do you manipulate which queue packets for a >>> TCP >>>>>>>> connection go to and how is it made worse by having an accept >> socket >>>> per >>>>>>>> shard? >>>>>>>> >>>>>>>> You also mention 160 ports as bad, but it doesn't sound like a big >>>> number >>>>>>>> resource wise. Is it an operational headache? >>>>>>>> >>>>>>>> RE tokens distributed amongst shards. The way that would work >> right >>>> now is >>>>>>>> that each port number appears to be a discrete instance of the >>>> server. So >>>>>>>> you could have shards be actual shards that are simply colocated >> on >>>> the >>>>>>>> same box, run in the same process, and share resources. I know >> this >>>> pushes >>>>>>>> more of the complexity into the server vs the driver as the server >>>> expects >>>>>>>> all shards to share some client visible like system tables and >>> certain >>>>>>>> identifiers. >>>>>>>> >>>>>>>> Ariel >>>>>>>>> On Thu, Apr 19, 2018, at 12:59 PM, Avi Kivity wrote: >>>>>>>>> Port-per-shard is likely the easiest option but it's too ugly to >>>>>>>>> contemplate. We run on machines with 160 shards (IBM POWER >>> 2s20c160t >>>>>>>>> IIRC), it will be just horrible to have 160 open ports. >>>>>>>>> >>>>>>>>> >>>>>>>>> It also doesn't fit will with the NICs ability to automatically >>>>>>>>> distribute packets among cores using multiple queues, so the >> kernel >>>>>>>>> would have to shuffle those packets around. Much better to have >>> those >>>>>>>>> packets delivered directly to the core that will service them. >>>>>>>>> >>>>>>>>> >>>>>>>>> (also, some protocol changes are needed so the driver knows how >>>> tokens >>>>>>>>> are distributed among shards) >>>>>>>>> >>>>>>>>>> On 2018-04-19 19:46, Ben Bromhead wrote: >>>>>>>>>> WRT to #3 >>>>>>>>>> To fit in the existing protocol, could you have each shard >> listen >>>> on a >>>>>>>>>> different port? Drivers are likely going to support this due to >>>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-7544 ( >>>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-11596). I'm >> not >>>> super >>>>>>>>>> familiar with the ticket so their might be something I'm missing >>>> but it >>>>>>>>>> sounds like a potential approach. >>>>>>>>>> >>>>>>>>>> This would give you a path forward at least for the short term. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Apr 19, 2018 at 12:10 PM Ariel Weisberg < >>> ar...@weisberg.ws> >>>>>>>> wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I think that updating the protocol spec to Cassandra puts the >>> onus >>>> on >>>>>>>> the >>>>>>>>>>> party changing the protocol specification to have an >>> implementation >>>>>>>> of the >>>>>>>>>>> spec in Cassandra as well as the Java and Python driver (those >>> are >>>>>>>> both >>>>>>>>>>> used in the Cassandra repo). Until it's implemented in >> Cassandra >>> we >>>>>>>> haven't >>>>>>>>>>> fully evaluated the specification change. There is no >> substitute >>>> for >>>>>>>> trying >>>>>>>>>>> to make it work. >>>>>>>>>>> >>>>>>>>>>> There are also realities to consider as to what the maintainers >>> of >>>> the >>>>>>>>>>> drivers are willing to commit. >>>>>>>>>>> >>>>>>>>>>> RE #1, >>>>>>>>>>> >>>>>>>>>>> I am +1 on the fact that we shouldn't require an extra hop for >>>> range >>>>>>>> scans. >>>>>>>>>>> In JIRA Jeremiah made the point that you can still do this from >>> the >>>>>>>> client >>>>>>>>>>> by breaking up the token ranges, but it's a leaky abstraction >> to >>>> have >>>>>>>> a >>>>>>>>>>> paging interface that isn't a vanilla ResultSet interface. >> Serial >>>> vs. >>>>>>>>>>> parallel is kind of orthogonal as the driver can do either. >>>>>>>>>>> >>>>>>>>>>> I agree it looks like the current specification doesn't make >> what >>>>>>>> should >>>>>>>>>>> be simple as simple as it could be for driver implementers. >>>>>>>>>>> >>>>>>>>>>> RE #2, >>>>>>>>>>> >>>>>>>>>>> +1 on this change assuming an implementation in Cassandra and >> the >>>>>>>> Java and >>>>>>>>>>> Python drivers. >>>>>>>>>>> >>>>>>>>>>> RE #3, >>>>>>>>>>> >>>>>>>>>>> It's hard to be +1 on this because we don't benefit by boxing >>>>>>>> ourselves in >>>>>>>>>>> by defining a spec we haven't implemented, tested, and decided >> we >>>> are >>>>>>>>>>> satisfied with. Having it in ScyllaDB de-risks it to a certain >>>>>>>> extent, but >>>>>>>>>>> what if Cassandra decides to go a different direction in some >>> way? >>>>>>>>>>> >>>>>>>>>>> I don't think there is much discussion to be had without an >>> example >>>>>>>> of the >>>>>>>>>>> the changes to the CQL specification to look at, but even then >> if >>>> it >>>>>>>> looks >>>>>>>>>>> risky I am not likely to be in favor of it. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Ariel >>>>>>>>>>> >>>>>>>>>>>> On Thu, Apr 19, 2018, at 9:33 AM, glom...@scylladb.com wrote: >>>>>>>>>>>> On 2018/04/19 07:19:27, kurt greaves <k...@instaclustr.com> >>>> wrote: >>>>>>>>>>>>>> 1. The protocol change is developed using the Cassandra >>> process >>>> in >>>>>>>>>>>>>> a JIRA ticket, culminating in a patch to >>>>>>>>>>>>>> doc/native_protocol*.spec when consensus is achieved. >>>>>>>>>>>>> I don't think forking would be desirable (for anyone) so this >>>> seems >>>>>>>>>>>>> the most reasonable to me. For 1 and 2 it certainly makes >> sense >>>> but >>>>>>>>>>>>> can't say I know enough about sharding to comment on 3 - >> seems >>>> to me >>>>>>>>>>>>> like it could be locking in a design before anyone truly >> knows >>>> what >>>>>>>>>>>>> sharding in C* looks like. But hopefully I'm wrong and there >>> are >>>>>>>>>>>>> devs out there that have already thought that through. >>>>>>>>>>>> Thanks. That is our view and is great to hear. >>>>>>>>>>>> >>>>>>>>>>>> About our proposal number 3: In my view, good protocol designs >>> are >>>>>>>>>>>> future proof and flexible. We certainly don't want to propose >> a >>>>>>>> design >>>>>>>>>>>> that works just for Scylla, but would support reasonable >>>>>>>>>>>> implementations regardless of how they may look like. >>>>>>>>>>>> >>>>>>>>>>>>> Do we have driver authors who wish to support both projects? >>>>>>>>>>>>> >>>>>>>>>>>>> Surely, but I imagine it would be a minority. >>>>>>>>>>>>> >>>>>>>>>>>> >>>> --------------------------------------------------------------------- >>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>> For >>>>>>>>>>>> additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>>>>>>>> >>>>>>>>>>> >>>> --------------------------------------------------------------------- >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>> Ben Bromhead >>>>>>>>>> CTO | Instaclustr <https://www.instaclustr.com/> >>>>>>>>>> +1 650 284 9692 <(650)%20284-9692> <(650)%20284-9692> >>> <(650)%20284-9692> >>>>>>>>>> Reliability at Scale >>>>>>>>>> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer >>>>>>>>>> >>>>>>>>> >>> --------------------------------------------------------------------- >>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>>>>> >>>>>>>> >>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>>>> >>>>>>>> -- >>>>>>> Ben Bromhead >>>>>>> CTO | Instaclustr <https://www.instaclustr.com/> >>>>>>> +1 650 284 9692 <(650)%20284-9692> <(650)%20284-9692> >>>>>>> Reliability at Scale >>>>>>> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer >>>>>>> >>>>>> >>>>>> ------------------------------------------------------------ >> --------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>> >>>>> >>>>> ------------------------------------------------------------ >> --------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>> >>>> >>>> -- >>> Ben Bromhead >>> CTO | Instaclustr <https://www.instaclustr.com/> >>> +1 650 284 9692 <(650)%20284-9692> >>> Reliability at Scale >>> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer >>> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org