Write once and compact is generally a bad fit for very large datasets. It is like being able to jump 60 feet in the air, but your legs can not withstand 10 feet drops.
http://wiki.apache.org/cassandra/LargeDataSetConsiderations On Wed, Feb 20, 2013 at 3:33 PM, Bryan Talbot <btal...@aeriagames.com> wrote: > There seem to be some data structures in cassandra which scale with the > number of rows stored and consume in-jvm memory without bound (other than > number of rows). Even with 1.2, I think that index samples are still kept > in-jvm so you may need to tune index_interval. Unfortunately that is a > global value so it will affect all CF and not just the big ones that need it > to be different. > > There may be other issues (like during compaction) but that one pops out. > Prior to 1.2, bloom filters would be a big problem too. > > -Bryan > > > > On Wed, Feb 20, 2013 at 12:20 PM, Hiller, Dean <dean.hil...@nrel.gov> wrote: >> >> Heh, we just discovered that mistake a few minutes ago….thanks though. I >> am now wondering and may run a test cluster with a separate 6 nodes and test >> how compaction is on very large data sets and such. We have tons of >> research data that sits there so I am wondering if 20T / node is now >> feasible with cassandra(I mean if mongodb has a 42T which 10gen was telling >> my colleague, I would think we can with cassandra). >> >> Is there any reasons I should know up front that 20T per node won't work. >> We have 20 disks per node and this definitely has a different profile than >> previous cassandra systems I have setup. We don't need really any caching >> as disk access is typically fine on reads. >> >> Thanks, >> Dean >> >> From: Bryan Talbot <btal...@aeriagames.com<mailto:btal...@aeriagames.com>> >> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" >> <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> >> Date: Wednesday, February 20, 2013 1:04 PM >> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" >> <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> >> Subject: Re: cassandra vs. mongodb quick question(good additional info) >> >> This calculation is incorrect btw. 10,000 GB transferred at 1.25 GB / sec >> would complete in about 8,000 seconds which is just 2.2 hours and not 5.5 >> days. The error is in the conversion (1hr/60secs) which is off by 2 orders >> of magnitude since (1hr/3600secs) is the correct conversion. >> >> -Bryan >> >> >> On Mon, Feb 18, 2013 at 5:00 PM, Hiller, Dean >> <dean.hil...@nrel.gov<mailto:dean.hil...@nrel.gov>> wrote: >> Google "10 gigabit in gigabytes" gives me 1.25 gigabytes/second (yes I >> could have divided by 8 in my head but eh…course when I saw the number, I >> went duh) >> >> So trying to transfer 10 Terabytes or 10,000 Gigabytes to a node that we >> are bringing online to replace a dead node would take approximately 5 >> days??? >> >> This means no one else is using the bandwidth too ;). 10,000Gigabytes * 1 >> second/1.25 * 1hr/60secs * 1 day / 24 hrs = 5.555555 days. This is more >> likely 11 days if we only use 50% of the network. >> >> So bringing a new node up to speed is more like 11 days once it is >> crashed. I think this is the main reason the 1Terabyte exists to begin >> with, right? >> > >