Sounds interesting. Could 80% of what we gain with a “shard” approach be 
achieved via Mesos to wrap a stateful service? Technically it’s “Sharding” the 
whole machine and better utilizing resources.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Apr 19, 2018, 1:23 PM -0500, sankalp kohli <kohlisank...@gmail.com>, wrote:
> If you donate Thread per core to C*, I am sure someone can help you review
> it and get it committed.
>
> On Thu, Apr 19, 2018 at 11:15 AM, Ben Bromhead <b...@instaclustr.com> wrote:
>
> > Re #3:
> >
> > Yup I was thinking each shard/port would appear as a discrete server to the
> > client.
> >
> > If the per port suggestion is unacceptable due to hardware requirements,
> > remembering that Cassandra is built with the concept scaling *commodity*
> > hardware horizontally, you'll have to spend your time and energy convincing
> > the community to support a protocol feature it has no (current) use for or
> > find another interim solution.
> >
> > Another way, would be to build support and consensus around a clear
> > technical need in the Apache Cassandra project as it stands today.
> >
> > One way to build community support might be to contribute an Apache
> > licensed thread per core implementation in Java that matches the protocol
> > change and shard concept you are looking for ;P
> >
> >
> > On Thu, Apr 19, 2018 at 1:43 PM Ariel Weisberg <ar...@weisberg.ws> wrote:
> >
> > > Hi,
> > >
> > > So at technical level I don't understand this yet.
> > >
> > > So you have a database consisting of single threaded shards and a socket
> > > for accept that is generating TCP connections and in advance you don't
> > know
> > > which connection is going to send messages to which shard.
> > >
> > > What is the mechanism by which you get the packets for a given TCP
> > > connection delivered to a specific core? I know that a given TCP
> > connection
> > > will normally have all of its packets delivered to the same queue from
> > the
> > > NIC because the tuple of source address + port and destination address +
> > > port is typically hashed to pick one of the queues the NIC presents. I
> > > might have the contents of the tuple slightly wrong, but it always
> > includes
> > > a component you don't get to control.
> > >
> > > Since it's hashing how do you manipulate which queue packets for a TCP
> > > connection go to and how is it made worse by having an accept socket per
> > > shard?
> > >
> > > You also mention 160 ports as bad, but it doesn't sound like a big number
> > > resource wise. Is it an operational headache?
> > >
> > > RE tokens distributed amongst shards. The way that would work right now
> > is
> > > that each port number appears to be a discrete instance of the server. So
> > > you could have shards be actual shards that are simply colocated on the
> > > same box, run in the same process, and share resources. I know this
> > pushes
> > > more of the complexity into the server vs the driver as the server
> > expects
> > > all shards to share some client visible like system tables and certain
> > > identifiers.
> > >
> > > Ariel
> > > On Thu, Apr 19, 2018, at 12:59 PM, Avi Kivity wrote:
> > > > Port-per-shard is likely the easiest option but it's too ugly to
> > > > contemplate. We run on machines with 160 shards (IBM POWER 2s20c160t
> > > > IIRC), it will be just horrible to have 160 open ports.
> > > >
> > > >
> > > > It also doesn't fit will with the NICs ability to automatically
> > > > distribute packets among cores using multiple queues, so the kernel
> > > > would have to shuffle those packets around. Much better to have those
> > > > packets delivered directly to the core that will service them.
> > > >
> > > >
> > > > (also, some protocol changes are needed so the driver knows how tokens
> > > > are distributed among shards)
> > > >
> > > > On 2018-04-19 19:46, Ben Bromhead wrote:
> > > > > WRT to #3
> > > > > To fit in the existing protocol, could you have each shard listen on
> > a
> > > > > different port? Drivers are likely going to support this due to
> > > > > https://issues.apache.org/jira/browse/CASSANDRA-7544 (
> > > > > https://issues.apache.org/jira/browse/CASSANDRA-11596). I'm not
> > super
> > > > > familiar with the ticket so their might be something I'm missing but
> > it
> > > > > sounds like a potential approach.
> > > > >
> > > > > This would give you a path forward at least for the short term.
> > > > >
> > > > >
> > > > > On Thu, Apr 19, 2018 at 12:10 PM Ariel Weisberg <ar...@weisberg.ws
> > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I think that updating the protocol spec to Cassandra puts the onus
> > on
> > > the
> > > > > > party changing the protocol specification to have an implementation
> > > of the
> > > > > > spec in Cassandra as well as the Java and Python driver (those are
> > > both
> > > > > > used in the Cassandra repo). Until it's implemented in Cassandra we
> > > haven't
> > > > > > fully evaluated the specification change. There is no substitute for
> > > trying
> > > > > > to make it work.
> > > > > >
> > > > > > There are also realities to consider as to what the maintainers of
> > the
> > > > > > drivers are willing to commit.
> > > > > >
> > > > > > RE #1,
> > > > > >
> > > > > > I am +1 on the fact that we shouldn't require an extra hop for range
> > > scans.
> > > > > >
> > > > > > In JIRA Jeremiah made the point that you can still do this from the
> > > client
> > > > > > by breaking up the token ranges, but it's a leaky abstraction to
> > have
> > > a
> > > > > > paging interface that isn't a vanilla ResultSet interface. Serial
> > vs.
> > > > > > parallel is kind of orthogonal as the driver can do either.
> > > > > >
> > > > > > I agree it looks like the current specification doesn't make what
> > > should
> > > > > > be simple as simple as it could be for driver implementers.
> > > > > >
> > > > > > RE #2,
> > > > > >
> > > > > > +1 on this change assuming an implementation in Cassandra and the
> > > Java and
> > > > > > Python drivers.
> > > > > >
> > > > > > RE #3,
> > > > > >
> > > > > > It's hard to be +1 on this because we don't benefit by boxing
> > > ourselves in
> > > > > > by defining a spec we haven't implemented, tested, and decided we
> > are
> > > > > > satisfied with. Having it in ScyllaDB de-risks it to a certain
> > > extent, but
> > > > > > what if Cassandra decides to go a different direction in some way?
> > > > > >
> > > > > > I don't think there is much discussion to be had without an example
> > > of the
> > > > > > the changes to the CQL specification to look at, but even then if it
> > > looks
> > > > > > risky I am not likely to be in favor of it.
> > > > > >
> > > > > > Regards,
> > > > > > Ariel
> > > > > >
> > > > > > On Thu, Apr 19, 2018, at 9:33 AM, glom...@scylladb.com wrote:
> > > > > > >
> > > > > > > On 2018/04/19 07:19:27, kurt greaves <k...@instaclustr.com> wrote:
> > > > > > > > > 1. The protocol change is developed using the Cassandra 
> > > > > > > > > process
> > in
> > > > > > > > > a JIRA ticket, culminating in a patch to
> > > > > > > > > doc/native_protocol*.spec when consensus is achieved.
> > > > > > > > I don't think forking would be desirable (for anyone) so this
> > seems
> > > > > > > > the most reasonable to me. For 1 and 2 it certainly makes sense
> > but
> > > > > > > > can't say I know enough about sharding to comment on 3 - seems 
> > > > > > > > to
> > me
> > > > > > > > like it could be locking in a design before anyone truly knows
> > what
> > > > > > > > sharding in C* looks like. But hopefully I'm wrong and there are
> > > > > > > > devs out there that have already thought that through.
> > > > > > > Thanks. That is our view and is great to hear.
> > > > > > >
> > > > > > > About our proposal number 3: In my view, good protocol designs are
> > > > > > > future proof and flexible. We certainly don't want to propose a
> > > design
> > > > > > > that works just for Scylla, but would support reasonable
> > > > > > > implementations regardless of how they may look like.
> > > > > > >
> > > > > > > > Do we have driver authors who wish to support both projects?
> > > > > > > >
> > > > > > > > Surely, but I imagine it would be a minority.
> > > > > > > >
> > > > > > > ------------------------------------------------------------
> > ---------
> > > > > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For
> > > > > > > additional commands, e-mail: dev-h...@cassandra.apache.org
> > > > > > >
> > > > > > ------------------------------------------------------------
> > ---------
> > > > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > > > >
> > > > > > --
> > > > > Ben Bromhead
> > > > > CTO | Instaclustr <https://www.instaclustr.com/
> > > > > +1 650 284 9692 <(650)%20284-9692
> > > > > Reliability at Scale
> > > > > Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
> > > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> > > --
> > Ben Bromhead
> > CTO | Instaclustr <https://www.instaclustr.com/
> > +1 650 284 9692
> > Reliability at Scale
> > Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
> >

Reply via email to