Wide rows or tons of rows?

Héctor Izquierdo Seliva Mon, 11 Oct 2010 07:13:34 -0700

Hi everyone.

I'm sure this question or similar has come up before, but I can't find a
clear answer. I have to store a unknown number of items in cassandra,
which can vary from a few hundreds to a few millions per customer.


I read that in cassandra wide rows are better than a lot of rows, but
then I face two problems. First, column distribution. The only way I can
think of distributing items among a given set of rows is hashing the
item id to a row id, and the using the item id as the column name. In
this way, I can distribute data among a few rows evenly, but If there
are only a few items it's equivalent to a row per item plus more
overhead, and if there are millions of items then the rows are to big,
and I have to turn off row cache. Does anybody knows a way around this? 

The second issue is that in my benchmarks, once the data is mmapped, one
item per row performs faster than wide rows by a significant margin. Is
this how it is supposed to be?

I can give additional data if needed. English is not my first language
so I apologize beforehand is some of this doesn't make sense.

Thanks for your time

Wide rows or tons of rows?

Reply via email to