Dor, Setting the Thread Per Core code aside, will your developers commit to contribute back both https://issues.apache.org/jira/browse/CASSANDRA-2848 and https://issues.apache.org/jira/browse/CASSANDRA-14311?
Looks like CASSANDRA-2848 has stalled even though some respectable work was done, and CASSANDRA-14311 hasn't been started yet. Some material contributions from your team on these two areas will be appreciated. On Mon, Apr 23, 2018 at 6:17 PM, Dor Laor <d...@scylladb.com> wrote: > On Mon, Apr 23, 2018 at 5:03 PM, Sankalp Kohli <kohlisank...@gmail.com> > wrote: > > > Is one of the “abuse” of Apache license is ScyllaDB which is using > > Cassandra but not contributing back? > > > > It's not that we have a private version of Cassandra and we don't release > all of it or some of it back.. > > We didn't contribute because we have a different server base. We always > contribute where it makes sense. > I'll be happy to have several beers or emails about the cons and pros of > open source licensing but I don't think > this is the case. The discussion is about whether the community wish to > accept our contributions, we initiated it, > didn't we? > > Let's be practical, I think it's not reasonable to commit C* protocol > changes that the community doesn't intend > to implement in C* in the short term (thread-per-core like), it's not > reasonable to expect Scylla to contribute > such a huge effort to the C* server. It is reasonable to collaborate around > protocol enhancements that are acceptable, > even without coding and make sure the protocol is enhanceable in a way that > forward compatible. > > > Happy to be proved wrong as I am not a lawyer and don’t understand various > > licenses .. > > > > > On Apr 23, 2018, at 16:55, Dor Laor <d...@scylladb.com> wrote: > > > > > >> On Mon, Apr 23, 2018 at 4:13 PM, Jonathan Haddad <j...@jonhaddad.com> > > wrote: > > >> > > >> From where I stand it looks like you've got only two options for any > > >> feature that involves updating the protocol: > > >> > > >> 1. Don't built the feature > > >> 2. Built it in Cassanda & scylladb, update the drivers accordingly > > >> > > >> I don't think you have a third option, which is built it only in > > ScyllaDB, > > >> because that means you have to fork *all* the drivers and make it > work, > > >> then maintain them. Your business model appears to be built on not > > doing > > >> any of the driver work yourself, and you certainly aren't giving back > to > > >> the open source community via a permissive license on ScyllaDB itself, > > so > > >> I'm a bit lost here. > > >> > > > > > > It's totally not about business model. > > > Scylla itself is 99% open source with AGPL license that prevents abuse > > and > > > forces to be committed back to the project. We also have our core > engine > > > (seastar) licensed > > > as Apache since it needs to be integrated with the core application. > > > Recently one of our community members even created a new Seastar based, > > C++ > > > driver. > > > > > > Scylla chose to be compatible with the drivers in order to leverage the > > > existing infrastructure > > > and (let's be frank) in order to allow smooth migration. > > > We would have loved to contribute more to the drivers but up to > recently > > we: > > > 1. Were busy on top of our heads with the server > > > 2. Happy w/ the existing drivers > > > 3. Developed extensions - GoCQLX - our own contribution > > > > > > Finally we can contribute back to the same driver project, we want to > do > > it > > > the right way, > > > without forking and without duplicated efforts. > > > > > > Many times, having a private fork is way easier than proper open source > > > work so from > > > a pure business perspective, we don't select the shortest path. > > > > > > > > >> > > >> To me it looks like you're asking a bunch of volunteers that work on > > >> Cassandra to accommodate you. What exactly do we get out of this > > >> relationship? What incentive do I or anyone else have to spend time > > >> helping you instead of working on something that interests me? > > >> > > > > > > Jon, this is certainty not the case. > > > We genuinely wish to make true *open source* work on: > > > a. Cassandra drivers > > > b. Client protocol > > > c. Scylla server side. > > > d. Cassandra community related work: mailing list, Jira, design > > > > > > But not > > > e. Cassandra server side > > > > > > While I wouldn't mind doing the Cassandra server work, we don't have > the > > > resources or > > > the expertise. The Cassandra _developer_ community is welcome to decide > > > whether > > > we get to contribute a/b/c/d. Avi has enumerated the options of > > > cooperation, passive cooperation > > > and zero cooperation (below). > > > > > > 1. The protocol change is developed using the Cassandra process in a > JIRA > > > ticket, culminating in a patch to doc/native_protocol*.spec when > > consensus > > > is achieved. > > > 2. The protocol change is developed outside the Cassandra process. > > > 3. No cooperation. > > > > > > Look, I can understand the hostility and suspicious, however, from the > C* > > > project POV, it makes no > > > sense to ignore, otherwise we'll fork the drivers and you won't get > > > anything back. There is another > > > at least one vendor today with their server fork and driver fork and it > > > makes sense to keep the protocol > > > unified in an extensible way and to discuss new features _together_. > > > > > > > > > > > >> > > >> Jon > > >> > > >> > > >> On Mon, Apr 23, 2018 at 7:59 AM Ben Bromhead <b...@instaclustr.com> > > wrote: > > >> > > >>>>>> This doesn't work without additional changes, for RF>1. The token > > >> ring > > >>>> could place two replicas of the same token range on the same > physical > > >>>> server, even though those are two separate cores of the same server. > > >> You > > >>>> could add another element to the hierarchy (cluster -> datacenter -> > > >> rack > > >>>> -> node -> core/shard), but that generates unneeded range movements > > >> when > > >>> a > > >>>> node is added. > > >>>>> I have seen rack awareness used/abused to solve this. > > >>>>> > > >>>> > > >>>> But then you lose real rack awareness. It's fine for a quick hack, > but > > >>>> not a long-term solution. > > >>>> > > >>>> (it also creates a lot more tokens, something nobody needs) > > >>>> > > >>> > > >>> I'm having trouble understanding how you loose "real" rack awareness, > > as > > >>> these shards are in the same rack anyway, because the address and > port > > >> are > > >>> on the same server in the same rack. So it behaves as expected. Could > > you > > >>> explain a situation where the shards on a single server would be in > > >>> different racks (or fault domains)? > > >>> > > >>> If you wanted to support a situation where you have a single rack per > > DC > > >>> for simple deployments, extending NetworkTopologyStrategy to behave > the > > >> way > > >>> it did before https://issues.apache.org/jira/browse/CASSANDRA-7544 > > with > > >>> respect to treating InetAddresses as servers rather than the address > > and > > >>> port would be simple. Both this implementation in Apache Cassandra > and > > >> the > > >>> respective load balancing classes in the drivers are explicitly > > designed > > >> to > > >>> be pluggable so that would be an easier integration point for you. > > >>> > > >>> I'm not sure how it creates more tokens? If a server normally owns > 256 > > >>> tokens, each shard on a different port would just advertise ownership > > of > > >>> 256/# of cores (e.g. 4 tokens if you had 64 cores). > > >>> > > >>> > > >>>> > > >>>>> Regards, > > >>>>> Ariel > > >>>>> > > >>>>>> On Apr 22, 2018, at 8:26 AM, Avi Kivity <a...@scylladb.com> wrote: > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>>> On 2018-04-19 21:15, Ben Bromhead wrote: > > >>>>>>> Re #3: > > >>>>>>> > > >>>>>>> Yup I was thinking each shard/port would appear as a discrete > > >> server > > >>>> to the > > >>>>>>> client. > > >>>>>> This doesn't work without additional changes, for RF>1. The token > > >> ring > > >>>> could place two replicas of the same token range on the same > physical > > >>>> server, even though those are two separate cores of the same server. > > >> You > > >>>> could add another element to the hierarchy (cluster -> datacenter -> > > >> rack > > >>>> -> node -> core/shard), but that generates unneeded range movements > > >> when > > >>> a > > >>>> node is added. > > >>>>>> > > >>>>>>> If the per port suggestion is unacceptable due to hardware > > >>>> requirements, > > >>>>>>> remembering that Cassandra is built with the concept scaling > > >>>> *commodity* > > >>>>>>> hardware horizontally, you'll have to spend your time and energy > > >>>> convincing > > >>>>>>> the community to support a protocol feature it has no (current) > use > > >>>> for or > > >>>>>>> find another interim solution. > > >>>>>> Those servers are commodity servers (not x86, but still > commodity). > > >> In > > >>>> any case 60+ logical cores are common now (hello AWS i3.16xlarge or > > >> even > > >>>> i3.metal), and we can only expect logical core count to continue to > > >>>> increase (there are 48-core ARM processors now). > > >>>>>> > > >>>>>>> Another way, would be to build support and consensus around a > clear > > >>>>>>> technical need in the Apache Cassandra project as it stands > today. > > >>>>>>> > > >>>>>>> One way to build community support might be to contribute an > Apache > > >>>>>>> licensed thread per core implementation in Java that matches the > > >>>> protocol > > >>>>>>> change and shard concept you are looking for ;P > > >>>>>> I doubt I'll survive the egregious top-posting that is going on in > > >>> this > > >>>> list. > > >>>>>> > > >>>>>>> > > >>>>>>>> On Thu, Apr 19, 2018 at 1:43 PM Ariel Weisberg < > ar...@weisberg.ws > > >>> > > >>>> wrote: > > >>>>>>>> > > >>>>>>>> Hi, > > >>>>>>>> > > >>>>>>>> So at technical level I don't understand this yet. > > >>>>>>>> > > >>>>>>>> So you have a database consisting of single threaded shards and > a > > >>>> socket > > >>>>>>>> for accept that is generating TCP connections and in advance you > > >>>> don't know > > >>>>>>>> which connection is going to send messages to which shard. > > >>>>>>>> > > >>>>>>>> What is the mechanism by which you get the packets for a given > TCP > > >>>>>>>> connection delivered to a specific core? I know that a given TCP > > >>>> connection > > >>>>>>>> will normally have all of its packets delivered to the same > queue > > >>>> from the > > >>>>>>>> NIC because the tuple of source address + port and destination > > >>>> address + > > >>>>>>>> port is typically hashed to pick one of the queues the NIC > > >>> presents. I > > >>>>>>>> might have the contents of the tuple slightly wrong, but it > always > > >>>> includes > > >>>>>>>> a component you don't get to control. > > >>>>>>>> > > >>>>>>>> Since it's hashing how do you manipulate which queue packets > for a > > >>> TCP > > >>>>>>>> connection go to and how is it made worse by having an accept > > >> socket > > >>>> per > > >>>>>>>> shard? > > >>>>>>>> > > >>>>>>>> You also mention 160 ports as bad, but it doesn't sound like a > big > > >>>> number > > >>>>>>>> resource wise. Is it an operational headache? > > >>>>>>>> > > >>>>>>>> RE tokens distributed amongst shards. The way that would work > > >> right > > >>>> now is > > >>>>>>>> that each port number appears to be a discrete instance of the > > >>>> server. So > > >>>>>>>> you could have shards be actual shards that are simply colocated > > >> on > > >>>> the > > >>>>>>>> same box, run in the same process, and share resources. I know > > >> this > > >>>> pushes > > >>>>>>>> more of the complexity into the server vs the driver as the > server > > >>>> expects > > >>>>>>>> all shards to share some client visible like system tables and > > >>> certain > > >>>>>>>> identifiers. > > >>>>>>>> > > >>>>>>>> Ariel > > >>>>>>>>> On Thu, Apr 19, 2018, at 12:59 PM, Avi Kivity wrote: > > >>>>>>>>> Port-per-shard is likely the easiest option but it's too ugly > to > > >>>>>>>>> contemplate. We run on machines with 160 shards (IBM POWER > > >>> 2s20c160t > > >>>>>>>>> IIRC), it will be just horrible to have 160 open ports. > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> It also doesn't fit will with the NICs ability to automatically > > >>>>>>>>> distribute packets among cores using multiple queues, so the > > >> kernel > > >>>>>>>>> would have to shuffle those packets around. Much better to have > > >>> those > > >>>>>>>>> packets delivered directly to the core that will service them. > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> (also, some protocol changes are needed so the driver knows how > > >>>> tokens > > >>>>>>>>> are distributed among shards) > > >>>>>>>>> > > >>>>>>>>>> On 2018-04-19 19:46, Ben Bromhead wrote: > > >>>>>>>>>> WRT to #3 > > >>>>>>>>>> To fit in the existing protocol, could you have each shard > > >> listen > > >>>> on a > > >>>>>>>>>> different port? Drivers are likely going to support this due > to > > >>>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-7544 ( > > >>>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-11596). I'm > > >> not > > >>>> super > > >>>>>>>>>> familiar with the ticket so their might be something I'm > missing > > >>>> but it > > >>>>>>>>>> sounds like a potential approach. > > >>>>>>>>>> > > >>>>>>>>>> This would give you a path forward at least for the short > term. > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> On Thu, Apr 19, 2018 at 12:10 PM Ariel Weisberg < > > >>> ar...@weisberg.ws> > > >>>>>>>> wrote: > > >>>>>>>>>>> Hi, > > >>>>>>>>>>> > > >>>>>>>>>>> I think that updating the protocol spec to Cassandra puts the > > >>> onus > > >>>> on > > >>>>>>>> the > > >>>>>>>>>>> party changing the protocol specification to have an > > >>> implementation > > >>>>>>>> of the > > >>>>>>>>>>> spec in Cassandra as well as the Java and Python driver > (those > > >>> are > > >>>>>>>> both > > >>>>>>>>>>> used in the Cassandra repo). Until it's implemented in > > >> Cassandra > > >>> we > > >>>>>>>> haven't > > >>>>>>>>>>> fully evaluated the specification change. There is no > > >> substitute > > >>>> for > > >>>>>>>> trying > > >>>>>>>>>>> to make it work. > > >>>>>>>>>>> > > >>>>>>>>>>> There are also realities to consider as to what the > maintainers > > >>> of > > >>>> the > > >>>>>>>>>>> drivers are willing to commit. > > >>>>>>>>>>> > > >>>>>>>>>>> RE #1, > > >>>>>>>>>>> > > >>>>>>>>>>> I am +1 on the fact that we shouldn't require an extra hop > for > > >>>> range > > >>>>>>>> scans. > > >>>>>>>>>>> In JIRA Jeremiah made the point that you can still do this > from > > >>> the > > >>>>>>>> client > > >>>>>>>>>>> by breaking up the token ranges, but it's a leaky abstraction > > >> to > > >>>> have > > >>>>>>>> a > > >>>>>>>>>>> paging interface that isn't a vanilla ResultSet interface. > > >> Serial > > >>>> vs. > > >>>>>>>>>>> parallel is kind of orthogonal as the driver can do either. > > >>>>>>>>>>> > > >>>>>>>>>>> I agree it looks like the current specification doesn't make > > >> what > > >>>>>>>> should > > >>>>>>>>>>> be simple as simple as it could be for driver implementers. > > >>>>>>>>>>> > > >>>>>>>>>>> RE #2, > > >>>>>>>>>>> > > >>>>>>>>>>> +1 on this change assuming an implementation in Cassandra and > > >> the > > >>>>>>>> Java and > > >>>>>>>>>>> Python drivers. > > >>>>>>>>>>> > > >>>>>>>>>>> RE #3, > > >>>>>>>>>>> > > >>>>>>>>>>> It's hard to be +1 on this because we don't benefit by boxing > > >>>>>>>> ourselves in > > >>>>>>>>>>> by defining a spec we haven't implemented, tested, and > decided > > >> we > > >>>> are > > >>>>>>>>>>> satisfied with. Having it in ScyllaDB de-risks it to a > certain > > >>>>>>>> extent, but > > >>>>>>>>>>> what if Cassandra decides to go a different direction in some > > >>> way? > > >>>>>>>>>>> > > >>>>>>>>>>> I don't think there is much discussion to be had without an > > >>> example > > >>>>>>>> of the > > >>>>>>>>>>> the changes to the CQL specification to look at, but even > then > > >> if > > >>>> it > > >>>>>>>> looks > > >>>>>>>>>>> risky I am not likely to be in favor of it. > > >>>>>>>>>>> > > >>>>>>>>>>> Regards, > > >>>>>>>>>>> Ariel > > >>>>>>>>>>> > > >>>>>>>>>>>> On Thu, Apr 19, 2018, at 9:33 AM, glom...@scylladb.com > wrote: > > >>>>>>>>>>>> On 2018/04/19 07:19:27, kurt greaves <k...@instaclustr.com> > > >>>> wrote: > > >>>>>>>>>>>>>> 1. The protocol change is developed using the Cassandra > > >>> process > > >>>> in > > >>>>>>>>>>>>>> a JIRA ticket, culminating in a patch to > > >>>>>>>>>>>>>> doc/native_protocol*.spec when consensus is achieved. > > >>>>>>>>>>>>> I don't think forking would be desirable (for anyone) so > this > > >>>> seems > > >>>>>>>>>>>>> the most reasonable to me. For 1 and 2 it certainly makes > > >> sense > > >>>> but > > >>>>>>>>>>>>> can't say I know enough about sharding to comment on 3 - > > >> seems > > >>>> to me > > >>>>>>>>>>>>> like it could be locking in a design before anyone truly > > >> knows > > >>>> what > > >>>>>>>>>>>>> sharding in C* looks like. But hopefully I'm wrong and > there > > >>> are > > >>>>>>>>>>>>> devs out there that have already thought that through. > > >>>>>>>>>>>> Thanks. That is our view and is great to hear. > > >>>>>>>>>>>> > > >>>>>>>>>>>> About our proposal number 3: In my view, good protocol > designs > > >>> are > > >>>>>>>>>>>> future proof and flexible. We certainly don't want to > propose > > >> a > > >>>>>>>> design > > >>>>>>>>>>>> that works just for Scylla, but would support reasonable > > >>>>>>>>>>>> implementations regardless of how they may look like. > > >>>>>>>>>>>> > > >>>>>>>>>>>>> Do we have driver authors who wish to support both > projects? > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Surely, but I imagine it would be a minority. > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>> ------------------------------------------------------------ > --------- > > >>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra. > apache.org > > >>> For > > >>>>>>>>>>>> additional commands, e-mail: dev-h...@cassandra.apache.org > > >>>>>>>>>>>> > > >>>>>>>>>>> > > >>>> ------------------------------------------------------------ > --------- > > >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > >>>>>>>>>>> For additional commands, e-mail: > dev-h...@cassandra.apache.org > > >>>>>>>>>>> > > >>>>>>>>>>> -- > > >>>>>>>>>> Ben Bromhead > > >>>>>>>>>> CTO | Instaclustr <https://www.instaclustr.com/> > > >>>>>>>>>> +1 650 284 9692 <(650)%20284-9692> <(650)%20284-9692> > > >>> <(650)%20284-9692> > > >>>>>>>>>> Reliability at Scale > > >>>>>>>>>> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and > Softlayer > > >>>>>>>>>> > > >>>>>>>>> > > >>> ------------------------------------------------------------ > --------- > > >>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > >>>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > > >>>>>>>>> > > >>>>>>>> > > >>> ------------------------------------------------------------ > --------- > > >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > >>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > > >>>>>>>> > > >>>>>>>> -- > > >>>>>>> Ben Bromhead > > >>>>>>> CTO | Instaclustr <https://www.instaclustr.com/> > > >>>>>>> +1 650 284 9692 <(650)%20284-9692> <(650)%20284-9692> > > >>>>>>> Reliability at Scale > > >>>>>>> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer > > >>>>>>> > > >>>>>> > > >>>>>> ------------------------------------------------------------ > > >> --------- > > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > >>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > > >>>>>> > > >>>>> > > >>>>> ------------------------------------------------------------ > > >> --------- > > >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > > >>>>> > > >>>> > > >>>> -- > > >>> Ben Bromhead > > >>> CTO | Instaclustr <https://www.instaclustr.com/> > > >>> +1 650 284 9692 <(650)%20284-9692> > > >>> Reliability at Scale > > >>> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer > > >>> > > >> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > >