Notes from committer's meeting: overview

Jonathan Ellis Mon, 25 Feb 2013 15:00:55 -0800

Last Thursday, DataStax put together a meeting of the active Cassandra
committers in San Mateo.  Dave Brosius was unable to make it to the
West coast, but Brandon, Eric, Gary, Jason, Pavel, Sylvain, Vijay,
Yuki, and I were able to attend, with Aleksey and Jake able to attend
part time over Google Hangout.

We started by asking each committer to outline his top 3 priorities
for 2.0. There was pretty broad consensus around the following big
items, which I will break out into separate threads:

* Streaming and repair
* Counters

There was also a lot of consensus that we'll be able to ship some form
of Triggers [1] in 2.0. Gary's suggestion was to focus on getting the
functionality nailed down first, then worry about classloader voodoo
to allow live reloading. There was also general agreement that we
need to split jar loading from trigger definition, to allow a single
trigger to be applied to be multiple tables.

There was less consensus around CAS [2], primarily because of
implementation difficulties. (I've since read up some more on Paxos
and Spinnaker and posted my thoughts to the ticket.)

Other subjects discussed:

A single Cassandra process does not scale well beyond 12 physical
cores. Further research is needed to understand why. One possibility
is GC overhead. Vijay is going to test Azul's Zing VM to confirm or
refute this.

Server-side aggregation functions [3]. This would remove the need to
pull a lot of data over the wire to a client unnecessarily. There was
some unease around moving beyond the relatively simple queries we've
traditionally supported, but I think there was general agreement that
this can be addressed by fencing aggregation to a single partition
unless explicitly allowed otherwise a la ALLOW FILTERING [4].

Extending cross-datacenter forwarding [5] to a "star" model. That is,
in the case of three or more datacenters, instead of the original
coordinator in DC A sending to replicas in DC B & C, A would forward
to B, which would forward to C. Thus, the bandwidth required for any
one DC would be constant as more datacenters are added.

Vnode improvements such as a vnode-aware replication strategy [6].

Cluster merging and splitting -- if I have multiple applications using
a single cassandra cluster, and one gets a lot more traffic than the
others, I may want to split that out into its own cluster. I think
there was a concrete proposal as to how this could work but someone
else will have to fill that in because I didn't write it down.

Auto-paging of SELECT queries for CQL [7], or put another way,
transparent cursors for the native CQL driver.

Make the storage engine more CQL-aware. Low-hanging fruit here
includes a prefix dictionary for all the composite cell names [8].

Resurrecting the StorageProxy API aka Fat Client. ("Does it even work
right now?" "Not really.")

Reducing context switches and increasing fairness in client
connections. HSHA prefers to accept new connections vs servicing
existing ones, so overload situations are problematic.

"Gossip is unreliable at 100s of nodes." Here again I missed any
concrete proposals to address this.

[1] https://issues.apache.org/jira/browse/CASSANDRA-1311. Start with
https://issues.apache.org/jira/browse/CASSANDRA-1311?focusedCommentId=13492827&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13492827
for the parts relevant to Vijay's proof of concept patch.
[2] https://issues.apache.org/jira/browse/CASSANDRA-5062
[3] https://issues.apache.org/jira/browse/CASSANDRA-4914
[4] https://issues.apache.org/jira/browse/CASSANDRA-4915
[5] https://issues.apache.org/jira/browse/CASSANDRA-3577
[6] https://issues.apache.org/jira/browse/CASSANDRA-4123
[7] https://issues.apache.org/jira/browse/CASSANDRA-4415
[8] https://issues.apache.org/jira/browse/CASSANDRA-4175

--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced

Notes from committer's meeting: overview

Reply via email to