Re: Regarding Cassandra Scalability

Mason Hale Sun, 18 Apr 2010 06:53:59 -0700

On Sun, Apr 18, 2010 at 8:26 AM, Brandon Williams <dri...@gmail.com> wrote:

>
> On Sun, Apr 18, 2010 at 8:00 AM, Mason Hale <ma...@onespot.com> wrote:
>
>> This is a statement I wish I had run across sooner. Our first
>> implementation (which we're changing now) included some very big rows. We
>> ran into trouble with compaction and during hinted hand-off operations
>> (which also deals with data a full row at a time) because these rows would
>> not fit into available memory.
>>
>> I think until there are not these lurking gotcha spots like compaction and
>> hinted hand-off, where a full row must fit in memory, we should not be
>> making misleading statements like "Cassandra has the advantage of a more
>> advanced datamodel, allowing for a single row to contain billions of
>> column/value pairs: enough to fill a machine." (from:
>> http://gigaom.com/2010/03/11/digg-cassandara/ ,
>> http://spyced.blogspot.com/2010/03/cassandra-in-action.html). A statement
>> like that should have some caveats, otherwise it reads as an endorsement, a
>> suggestion even, to build a data model with massively wide rows. In
>> practice, it is not feasible to have billions of columns in a single row
>> because it will lead to problems with compaction and hinted hand-off, maybe
>> elsewhere.
>>
>> Mason
>>
>
> http://wiki.apache.org/cassandra/CassandraLimitations
>
> We aren't hiding anything from the user who wishes to educate themselves.
>
> -Brandon
>
>
I didn't mean to imply anyone was hiding information. I was pointing out
that there is conflicting information floating about. And if someone doesn't
read and internalize the entire wiki, they may not realize the quote they
read from Stu Hood about Cassandra describes something that is actually not
a good idea in practice.

In my case, I believe I first read the CassandraLimitations page *after* I
ran into the problem with rows not fitting into memory, and it was after
reading statements about storing billions of columns in a single row. Or
maybe I read the CassandraLimitations page earlier but everything was so new
at that point that it didn't occur to me that the information there negated
the information about storing billions of columns per row. Or maybe I read
both pieces of information, but the one I remembered was the one about
storing billions of columns per row. I don't recall exactly.

My point is -- conflicting information is bad. It relies on the consumer of
that information finding both the "correct" and "incorrect" information and
then accurately reconciling the two. There are many ways that won't work out
well. The best solution is to not have the conflicting information in the
first place.

I'm just offering up my experience as a lesson learned. It is an example of
where I think the Cassandra community can do a better job of communicating.
I just know I had a bit of a bumpy ride because of this issue, and so I'm
pointing it out constructively in hopes that someone after me will not stub
their toe on the same issue.

Mason

Re: Regarding Cassandra Scalability

Reply via email to