On Sun, Apr 18, 2010 at 7:41 AM, Gary Dusbabek <gdusba...@gmail.com> wrote:

> On Sat, Apr 17, 2010 at 10:50, dir dir <sikerasa...@gmail.com> wrote:
> >
> > What problems can’t it solve?
> >
> > No flexible indices
> > No querying on non PK values
> > Not good for binary data (>64mb) unless you chunck
> > Row contents must fit in available memory
> >
> > Gary Dusbabek say: Row contents must fit in available memory. Honestly I
> do
> > not understand
> > the meaning from that statement. Thank you.
> >
> > Dir.
> >
>
> The main reason is that the compaction operation (removing deleted
> values) currently requires that an entire row be read into memory.
>
> Gary Dusbabek
>


This is a statement I wish I had run across sooner. Our first implementation
(which we're changing now) included some very big rows. We ran into trouble
with compaction and during hinted hand-off operations (which also deals with
data a full row at a time) because these rows would not fit into available
memory.

I think until there are not these lurking gotcha spots like compaction and
hinted hand-off, where a full row must fit in memory, we should not be
making misleading statements like "Cassandra has the advantage of a more
advanced datamodel, allowing for a single row to contain billions of
column/value pairs: enough to fill a machine." (from:
http://gigaom.com/2010/03/11/digg-cassandara/ ,
http://spyced.blogspot.com/2010/03/cassandra-in-action.html). A statement
like that should have some caveats, otherwise it reads as an endorsement, a
suggestion even, to build a data model with massively wide rows. In
practice, it is not feasible to have billions of columns in a single row
because it will lead to problems with compaction and hinted hand-off, maybe
elsewhere.

Mason

Reply via email to