Hi Mark, I'm a relative newcomer to Cassandra, but I believe the common experience is that you start seeing gains after 5 nodes in a column-oriented data store. It may also depend on your usage pattern.
Others may know better - hope this helps! -- Jim R. Wilson (jimbojw) On Wed, Apr 21, 2010 at 11:28 AM, Mark Jones <mjo...@imagehawk.com> wrote: > I’m seeing a cluster of 4 (replication factor=2) to be about as slow overall > as the barely faster than the slowest node in the group. When I run the 4 > nodes individually, I see: > > > > For inserts: > > Two nodes @ 12000/second > > 1 node @ 9000/second > > 1 node @ 7000/second > > > > For reads: > > Abysmal, less than 1000/second (not range slices, individual lookups) Disk > util @ 88+% > > > > > > How many nodes are required before you see a net positive gain on inserts > and reads (QUORUM consistency on both)? > > When I use my 2 fastest nodes as a pair, the thruput is around 9000 > inserts/second. > > > > What is a good to excellent hardware config for Cassandra? I have separate > drives for data and commit log and 8GB in 3 machines (all dual core). My > fastest insert node has 4GB and a triple core processor. > > > > I’ve run py_stress, and my C++ code beats it by several 1000 inserts/second > toward the end of the runs, so I don’t think it is my app, and I’ve removed > the super columns per some suggestions yesterday. > > > > When Cassandra is working, it performs well, the problem is that is > frequently slows down to < 50% of its peaks and occasionally slows down to 0 > inserts/second which greatly reduces aggregate thruput.