On Sun, Apr 18, 2010 at 7:41 AM, Gary Dusbabek <gdusba...@gmail.com> wrote:
> On Sat, Apr 17, 2010 at 10:50, dir dir <sikerasa...@gmail.com> wrote: > > > > What problems can’t it solve? > > > > No flexible indices > > No querying on non PK values > > Not good for binary data (>64mb) unless you chunck > > Row contents must fit in available memory > > > > Gary Dusbabek say: Row contents must fit in available memory. Honestly I > do > > not understand > > the meaning from that statement. Thank you. > > > > Dir. > > > > The main reason is that the compaction operation (removing deleted > values) currently requires that an entire row be read into memory. > > Gary Dusbabek > This is a statement I wish I had run across sooner. Our first implementation (which we're changing now) included some very big rows. We ran into trouble with compaction and during hinted hand-off operations (which also deals with data a full row at a time) because these rows would not fit into available memory. I think until there are not these lurking gotcha spots like compaction and hinted hand-off, where a full row must fit in memory, we should not be making misleading statements like "Cassandra has the advantage of a more advanced datamodel, allowing for a single row to contain billions of column/value pairs: enough to fill a machine." (from: http://gigaom.com/2010/03/11/digg-cassandara/ , http://spyced.blogspot.com/2010/03/cassandra-in-action.html). A statement like that should have some caveats, otherwise it reads as an endorsement, a suggestion even, to build a data model with massively wide rows. In practice, it is not feasible to have billions of columns in a single row because it will lead to problems with compaction and hinted hand-off, maybe elsewhere. Mason