Short answer: no, there is no formula into which you can plug numbers. Longer answer: benchmark with a subset of your data and extrapolate. The closer the test data is to real data, the more accurate it will be. Yes, compaction is O(N) wrt the amount of data in the system, so don't do it more than necessary (increase memtable flush thresholds; go easy on nodetool compact).
On Mon, May 3, 2010 at 4:34 PM, Jon Graham <sjclou...@gmail.com> wrote: > Hello Everyone, > > Is there a practical formula for determining Cassandra system requirements > using OrderPreservingPartitioner ? > > We have hundreds of millions of rows in a single column family with a > potential target of maybe a billion rows. > > How can we estimate the Cassandra system requirements given factors such as: > > N=number of nodes > M=memory allocated for Cassandra > R=replication factor > K=key size > D=individual column data size > CR=columns/row > NR=number of rows (keys) in column family > > It seems like the compaction process gets more stressed as we add more data, > but I have no idea how close we are > to a breaking point. > > Thanks, > Jon > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com