Then what will be the sweetspot for Cassandra? I am more interested in Cassandra because my application is write heavy.
Till now what I have understood is that Cassandra will not work best for SANs too? P.S Mongodb is also a nosql database and designed for horizontal scaling then how its good for the same hardware for which Cassandra is not a good candidate? ----- Original Message ----- From: Bill <b...@dehora.net> To: user@cassandra.apache.org Cc: Sent: Sunday, September 4, 2011 4:34 AM Subject: Re: commodity server spec [100% agree with Chris] China, the machines you're describing sound nice for mongodb/postgres/mysql, but probably not the sweetspot for Cassandra. Obviously (well depending on near term load) you don't want to get burned on excess footprint. But a realistic, don't lose data, be fairly available deployment is going to span at least 2 racks/power supplies and have data replicated offsite (at least as passive for DR). So I would consider 6-9 relatively weaker servers rather than 3 scale up joints. You'll save some capex, and the amount of opex overhead is probably worth it traded off against the operational risk. 3 is an awkward number to operate for anything that needs to be available (although many people seem to start with that, I am guessing because triplication is traditionally understood under failure) as it immediately puts 50% extra load on the remaining 2 when one node goes away. One will go away, even transiently, when it is upgraded, crashes, gets into a funk due to compaction or garbage collection, and load will then be shunted onto the other 2 - remember Cassandra has no backoff/throttling in place. I'd allow for something breaking at some point (dbs even the mature ones, fail from time to time) and 2 doesn't give you much room to maneuver in production. Bill On 03/09/11 23:05, Chris Goffinet wrote: > It will also depend on how long you can handle recovery time. So imagine > this case: > > 3 nodes w/ RF of 3 > Each node has 30TB of space used (you never want to fill up entire node). > If one node fails and you must recover, that will take over 3.6 days in > just transferring data alone. That's with a sustained 800megabit/s > (100MB/s). In the real world it's going to fluctuate so add some > padding. Also, since you will be saturating one of the other nodes, now > you're network latency performance is suffering and you only have 1 > machine to handle the remaining traffic while you're recovering. And if > you want to expand the cluster in the future (more nodes), the amount of > data to transfer is going to be very large and most likely days to add > machines. From my experience it's must better to have a larger cluster > setup upfront for future growth than getting by with 6-12 nodes at the > start. You will feel less pain, easier to manage node failures (bad > disks, mem, etc). > > 3 nodes with RF of 1 wouldn't make sense. > > > On Sat, Sep 3, 2011 at 4:05 AM, China Stoffen <chinastof...@yahoo.com > <mailto:chinastof...@yahoo.com>> wrote: > > Many small servers would drive up the hosting cost way too high so > want to avoid this solution if we can. > > ----- Original Message ----- > From: Radim Kolar <h...@sendmail.cz <mailto:h...@sendmail.cz>> > To: user@cassandra.apache.org <mailto:user@cassandra.apache.org> > Cc: > Sent: Saturday, September 3, 2011 9:37 AM > Subject: Re: commodity server spec > > many smaller servers way better > >