Nicolas, Were all of those super column writes going to the same row? http://wiki.apache.org/cassandra/CassandraLimitations
Thanks, Stu -----Original Message----- From: "Nicolas Labrot" <nith...@gmail.com> Sent: Wednesday, April 21, 2010 11:54am To: user@cassandra.apache.org Subject: Re: Cassandra tuning for running test on a desktop I donnot have a website ;) I'm testing the viability of Cassandra to store XML documents and make fast search queries. 4000 XML files (80MB of XML) create with my datamodel (one SC per XML node) 1000000 SC which make Cassandra go OOM with Xmx 1GB. On the contrary an xml DB like eXist handles 4000 XML doc without any problem with an acceptable amount of memories. What I like with Cassandra is his simplicity and his scalability. eXist is not able to scale with data, the only viable solution his marklogic which cost an harm and a feet... :) I will install linux and buy some memories to continue my test. Could a Cassandra developper give me the technical reason of this OOM ? On Wed, Apr 21, 2010 at 5:13 PM, Mark Greene <green...@gmail.com> wrote: > Maybe, maybe not. Presumably if you are running a RDMS with any reasonable > amount of traffic now a days, it's sitting on a machine with 4-8G of memory > at least. > > > On Wed, Apr 21, 2010 at 10:48 AM, Nicolas Labrot <nith...@gmail.com>wrote: > >> Thanks Mark. >> >> Cassandra is maybe too much for my need ;) >> >> >> >> On Wed, Apr 21, 2010 at 4:45 PM, Mark Greene <green...@gmail.com> wrote: >> >>> Hit send to early.... >>> >>> That being said a lot of people running Cassandra in production are using >>> 4-6GB max heaps on 8GB machines, don't know if that helps but hopefully >>> gives you some perspective. >>> >>> >>> On Wed, Apr 21, 2010 at 10:39 AM, Mark Greene <green...@gmail.com>wrote: >>> >>>> RAM doesn't necessarily need to be proportional but I would say the >>>> number of nodes does. You can't just throw a bazillion inserts at one node. >>>> This is the main benefit of Cassandra is that if you start hitting your >>>> capacity, you add more machines and distribute the keys across more >>>> machines. >>>> >>>> >>>> On Wed, Apr 21, 2010 at 9:07 AM, Nicolas Labrot <nith...@gmail.com>wrote: >>>> >>>>> So does it means the RAM needed is proportionnal with the data handled >>>>> ? >>>>> >>>>> Or Cassandra need a minimum amount or RAM when dataset is big? >>>>> >>>>> I must confess this OOM behaviour is strange. >>>>> >>>>> >>>>> On Wed, Apr 21, 2010 at 2:54 PM, Mark Jones <mjo...@imagehawk.com>wrote: >>>>> >>>>>> On my 4GB machine I’m giving it 3GB and having no trouble with 60+ >>>>>> million 500 byte columns >>>>>> >>>>>> >>>>>> >>>>>> *From:* Nicolas Labrot [mailto:nith...@gmail.com] >>>>>> *Sent:* Wednesday, April 21, 2010 7:47 AM >>>>>> *To:* user@cassandra.apache.org >>>>>> *Subject:* Re: Cassandra tuning for running test on a desktop >>>>>> >>>>>> >>>>>> >>>>>> I have try 1400M, and Cassandra OOM too. >>>>>> >>>>>> Is there another solution ? My data isn't very big. >>>>>> >>>>>> It seems that is the merge of the db >>>>>> >>>>>> On Wed, Apr 21, 2010 at 2:14 PM, Mark Greene <green...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> Trying increasing Xmx. 1G is probably not enough for the amount of >>>>>> inserts you are doing. >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Apr 21, 2010 at 8:10 AM, Nicolas Labrot <nith...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> Hello, >>>>>> >>>>>> For my first message I will first thanks Cassandra contributors for >>>>>> their great works. >>>>>> >>>>>> I have a parameter issue with Cassandra (I hope it's just a parameter >>>>>> issue). I'm using Cassandra 6.0.1 with Hector client on my desktop. It's >>>>>> a >>>>>> simple dual core with 4GB of RAM on WinXP. I have keep the default JVM >>>>>> option inside cassandra.bat (Xmx1G) >>>>>> >>>>>> I'm trying to insert 3 millions of SC with 6 Columns each inside 1 CF >>>>>> (named Super1). The insertion go to 1 millions of SC (without slowdown) >>>>>> and >>>>>> Cassandra crash because of an OOM. (I store an average of 100 bytes per >>>>>> SC >>>>>> with a max of 10kB). >>>>>> I have aggressively decreased all the memories parameters without any >>>>>> respect to the consistency (My config is here [1]), the cache is turn off >>>>>> but Cassandra still go to OOM. I have joined the last line of the >>>>>> Cassandra >>>>>> life [2]. >>>>>> >>>>>> What can I do to fix my issue ? Is there another solution than >>>>>> increasing the Xmx ? >>>>>> >>>>>> Thanks for your help, >>>>>> >>>>>> Nicolas >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> [1] >>>>>> <Keyspaces> >>>>>> <Keyspace Name="Keyspace1"> >>>>>> <ColumnFamily Name="Super1" >>>>>> ColumnType="Super" >>>>>> CompareWith="BytesType" >>>>>> CompareSubcolumnsWith="BytesType" /> >>>>>> >>>>>> <ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy> >>>>>> <ReplicationFactor>1</ReplicationFactor> >>>>>> >>>>>> <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch> >>>>>> </Keyspace> >>>>>> </Keyspaces> >>>>>> <CommitLogRotationThresholdInMB>32</CommitLogRotationThresholdInMB> >>>>>> >>>>>> <DiskAccessMode>auto</DiskAccessMode> >>>>>> <RowWarningThresholdInMB>64</RowWarningThresholdInMB> >>>>>> <SlicedBufferSizeInKB>64</SlicedBufferSizeInKB> >>>>>> <FlushDataBufferSizeInMB>16</FlushDataBufferSizeInMB> >>>>>> <FlushIndexBufferSizeInMB>4</FlushIndexBufferSizeInMB> >>>>>> <ColumnIndexSizeInKB>64</ColumnIndexSizeInKB> >>>>>> >>>>>> <MemtableThroughputInMB>16</MemtableThroughputInMB> >>>>>> <BinaryMemtableThroughputInMB>32</BinaryMemtableThroughputInMB> >>>>>> <MemtableOperationsInMillions>0.01</MemtableOperationsInMillions> >>>>>> <MemtableObjectCountInMillions>0.01</MemtableObjectCountInMillions> >>>>>> <MemtableFlushAfterMinutes>60</MemtableFlushAfterMinutes> >>>>>> <ConcurrentReads>4</ConcurrentReads> >>>>>> <ConcurrentWrites>8</ConcurrentWrites> >>>>>> </Storage> >>>>>> >>>>>> >>>>>> [2] >>>>>> INFO 13:36:41,062 Super1 has reached its threshold; switching in a >>>>>> fresh Memtable at >>>>>> CommitLogContext(file='d:/cassandra/commitlog\CommitLog-1271849783703.log', >>>>>> position=5417524) >>>>>> INFO 13:36:41,062 Enqueuing flush of Memtable(Super1)@15385755 >>>>>> INFO 13:36:41,062 Writing Memtable(Super1)@15385755 >>>>>> INFO 13:36:42,062 Completed flushing >>>>>> d:\cassandra\data\Keyspace1\Super1-711-Data.db >>>>>> INFO 13:36:45,781 Super1 has reached its threshold; switching in a >>>>>> fresh Memtable at >>>>>> CommitLogContext(file='d:/cassandra/commitlog\CommitLog-1271849783703.log', >>>>>> position=6065637) >>>>>> INFO 13:36:45,781 Enqueuing flush of Memtable(Super1)@15578910 >>>>>> INFO 13:36:45,796 Writing Memtable(Super1)@15578910 >>>>>> INFO 13:36:46,109 Completed flushing >>>>>> d:\cassandra\data\Keyspace1\Super1-712-Data.db >>>>>> INFO 13:36:54,296 GC for ConcurrentMarkSweep: 7149 ms, 58337240 >>>>>> reclaimed leaving 922392600 used; max is 1174208512 >>>>>> INFO 13:36:54,593 Super1 has reached its threshold; switching in a >>>>>> fresh Memtable at >>>>>> CommitLogContext(file='d:/cassandra/commitlog\CommitLog-1271849783703.log', >>>>>> position=6722241) >>>>>> INFO 13:36:54,593 Enqueuing flush of Memtable(Super1)@24468872 >>>>>> INFO 13:36:54,593 Writing Memtable(Super1)@24468872 >>>>>> INFO 13:36:55,421 Completed flushing >>>>>> d:\cassandra\data\Keyspace1\Super1-713-Data.dbjava.lang.OutOfMemoryError: >>>>>> Java heap space >>>>>> INFO 13:37:08,281 GC for ConcurrentMarkSweep: 5561 ms, 9432 reclaimed >>>>>> leaving 971904520 used; max is 1174208512 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>> >> >