Re: Second Cassandra users survey

Todd Burruss Wed, 09 Nov 2011 09:53:47 -0800

Thx jake for the JIRA, but there was someone at the conference that had already 
implemented what I mentioned.  It didn't offer any atomicity, just co-locating 
a family of data on the same node.


From: Jake Luciani <jak...@gmail.com<mailto:jak...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Wed, 9 Nov 2011 02:53:20 -0800
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Second Cassandra users survey

Hi Todd,

Entity Groups : https://issues.apache.org/jira/browse/CASSANDRA-1684

-Jake

On Wed, Nov 9, 2011 at 6:44 AM, Todd Burruss 
<bburr...@expedia.com<mailto:bburr...@expedia.com>> wrote:
I believe I heard someone talk at Cassandra SF conference about creating a
partitioner that was a derivation of RandomPartitioner.  It essentially
would look for keys that adhere to a certain pattern, like <key>:<subkey>.
 The <key> portion would be used for determining the location on the ring,
but <key>:<subkey> for actually storing.  This would allow groups of data
(all having the same <key>) to reside on the same node, while still
maintaining uniqueness across the entire keyspace.

Unbalanced nodes could still occur, but I don't think any worse than
wide/large rows can cause.


On 11/8/11 1:29 AM, "Daniel Doubleday" 
<daniel.double...@gmx.net<mailto:daniel.double...@gmx.net>> wrote:

>Ah cool - thanks for the pointer!
>
>On Nov 7, 2011, at 5:25 PM, Ed Anuff wrote:
>
>> This is basically what entity groups are about -
>> https://issues.apache.org/jira/browse/CASSANDRA-1684
>>
>> On Mon, Nov 7, 2011 at 5:26 AM, Peter Lin 
>> <wool...@gmail.com<mailto:wool...@gmail.com>> wrote:
>>> This feature interests me, so I thought I'd add some comments.
>>>
>>> Having used partition features in existing databases like DB2, Oracle
>>> and manual partitioning, one of the biggest challenges is keeping the
>>> partitions balanced. What I've seen with manual partitioning is that
>>> often the partitions get unbalanced. Usually the developers take a
>>> best guess and hope it ends up balanced.
>>>
>>> Some of the approaches I've used in the past were zip code, area code,
>>> state and some kind of hash.
>>>
>>> So my question related deterministic sharding is this, "what rebalance
>>> feature(s) would be useful or needed once the partitions get
>>> unbalanced?"
>>>
>>> Without a decent plan for rebalancing, it often ends up being a very
>>> painful problem to solve in production. Back when I worked mobile
>>> apps, we saw issues with how OpenWave WAP servers partitioned the
>>> accounts. The early versions randomly assigned a phone to a server
>>> when it is provisioned the first time. Once the phone was associated
>>> to that server, it was stuck on that server. If the load on that
>>> server was heavier than the others, the only choice was to "scale up"
>>> the hardware.
>>>
>>> My understanding of Cassandra's current sharding is consistent and
>>> random. Does the new feature sit some where in-between? Are you
>>> thinking of a pluggable API so that you can provide your own hash
>>> algorithm for cassandra to use?
>>>
>>>
>>>
>>> On Mon, Nov 7, 2011 at 7:54 AM, Daniel Doubleday
>>> <daniel.double...@gmx.net<mailto:daniel.double...@gmx.net>> wrote:
>>>> Allow for deterministic / manual sharding of rows.
>>>>
>>>> Right now it seems that there is no way to force rows with different
>>>>row keys will be stored on the same nodes in the ring.
>>>> This is our number one reason why we get data inconsistencies when
>>>>nodes fail.
>>>>
>>>> Sometimes a logical transaction requires writing rows with different
>>>>row keys. If we could use something like this:
>>>>
>>>> prefix.uniquekey and let the partitioner use only the prefix the
>>>>probability that only part of the transaction would be written could
>>>>be reduced considerably.
>>>>
>>>>
>>>>
>>>> On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Two years ago I asked for Cassandra use cases and feature requests.
>>>>> [1]  The results [2] have been extremely useful in setting and
>>>>> prioritizing goals for Cassandra development.  But with the release
>>>>>of
>>>>> 1.0 we've accomplished basically everything from our original wish
>>>>> list. [3]
>>>>>
>>>>> I'd love to hear from modern Cassandra users again, especially if
>>>>> you're usually a quiet lurker.  What does Cassandra do well?  What
>>>>>are
>>>>> your pain points?  What's your feature wish list?
>>>>>
>>>>> As before, if you're in stealth mode or don't want to say anything in
>>>>> public, feel free to reply to me privately and I will keep it off the
>>>>> record.
>>>>>
>>>>> [1]
>>>>>http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg0114
>>>>>8.html
>>>>> [2]
>>>>>http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg014
>>>>>46.html
>>>>> [3]
>>>>>http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html
>>>>>
>>>>> --
>>>>> Jonathan Ellis
>>>>> Project Chair, Apache Cassandra
>>>>> co-founder of DataStax, the source for professional Cassandra support
>>>>> http://www.datastax.com
>>>>
>>>>
>>>
>




--
http://twitter.com/tjake

Re: Second Cassandra users survey

Reply via email to