Hi everyone. I'm sure this question or similar has come up before, but I can't find a clear answer. I have to store a unknown number of items in cassandra, which can vary from a few hundreds to a few millions per customer.
I read that in cassandra wide rows are better than a lot of rows, but then I face two problems. First, column distribution. The only way I can think of distributing items among a given set of rows is hashing the item id to a row id, and the using the item id as the column name. In this way, I can distribute data among a few rows evenly, but If there are only a few items it's equivalent to a row per item plus more overhead, and if there are millions of items then the rows are to big, and I have to turn off row cache. Does anybody knows a way around this? The second issue is that in my benchmarks, once the data is mmapped, one item per row performs faster than wide rows by a significant margin. Is this how it is supposed to be? I can give additional data if needed. English is not my first language so I apologize beforehand is some of this doesn't make sense. Thanks for your time