Re: Cassandra crashes - solved

Jan Algermissen Sun, 08 Sep 2013 10:33:21 -0700

On 06.09.2013, at 17:07, Jan Algermissen <jan.algermis...@nordsc.com> wrote:


> 
> On 06.09.2013, at 13:12, Alex Major <al3...@gmail.com> wrote:
> 
>> Have you changed the appropriate config settings so that Cassandra will run 
>> with only 2GB RAM? You shouldn't find the nodes go down.
>> 
>> Check out this blog post 
>> http://www.opensourceconnections.com/2013/08/31/building-the-perfect-cassandra-test-environment/
>>  , it outlines the configuration settings needed to run Cassandra on 64MB 
>> RAM and might give you some insights.
> 
> Yes, I have my fingers on the knobs and have also seen the article you 
> mention - very helpful indeed. As well as the replies so far. Thanks very 
> much.
> 
> However, I still manage to kill 2 or 3 nodes of my 3-node cluster with my 
> data import :-(

The problem for me was

  in_memory_compaction_limit_in_mb: 1

it seems that the combination of my rather large rows (70 cols each) in 
combination with the slower two-pass compaction process mentioned in the 
comment of the config switch caused the "java.lang.AssertionError: incorrect 
row data size" exceptions.

After turning in_memory_compaction_limit_in_mb back to 64 all I am getting are 
write tmeouts.

AFAIU that is fine because now C* is stable and i all have is a capacity 
problem solvable with more nodes or more RAM (maybe, depends on whether IO is 
an issue).

Jan



> 
> Now, while it would be easy to scale out and up a bit until the default 
> config of C* is sufficient, I really like to dive deep and try to understand 
> why the thing is still going down, IOW, which of my config settings is so 
> darn wrong that in most cases kill -9 remains the only way to shutdown the 
> Java process in the end.
> 
> 
> The problem seems to be the heap size (set to MAX_HEAP_SIZE="640M"   and 
> HEAP_NEWSIZE="120M" ) in combination with some cassandra activity that 
> demands too much heap, right?
> 
> So how do I find out what activity this is and how do I sufficiently reduce 
> that activity.
> 
> What bugs me in general is that AFAIU C* is so eager at giving massive write 
> speed, that it sort of forgets to protect itself from client demand. I would 
> very much like to understand why and how that happens.  I mean: no matter how 
> many clients are flooding the database, it should not die due to out of 
> memory situations, regardless of any configuration specifics, or?
> 
> 
> tl;dr
> 
> Currently my client side (with java-driver) after a while reports more and 
> more timeouts and then the following exception:
> 
> com.datastax.driver.core.ex
> ceptions.DriverInternalError: An unexpected error occured server side: 
> java.lang.OutOfMemoryError: unable 
> to create new native thread ;
> 
> On the server side, my cluster remains more or less in this condition:
> 
> DN  xxxxx     71,33 MB   256     34,1%  2f5e0b70-dbf4-4f37-8d5e-746ab76efbae  
> rack1
> UN  xxxxx  189,38 MB  256     32,0%  e6d95136-f102-49ce-81ea-72bd6a52ec5f  
> rack1
> UN  xxxxx    198,49 MB  256     33,9%  0c2931a9-6582-48f2-b65a-e406e0bf1e56  
> rack1
> 
> The host that is down (it is the seed host, if that matters) still shows the 
> running java process, but I cannot shut down cassandra or connect with 
> nodetool, hence kill -9 to the rescue.
> 
> In that host, I still see a load of around 1.
> 
> jstack -F lists 892 threads, all blocked, except for 5 inactive ones.
> 
> 
> The system.log after a few seconds of import shows the following exception:
> 
> java.lang.AssertionError: incorrect row data size 771030 written to 
> /var/lib/cassandra/data/products/product/products-product-tmp-ic-6-Data.db; 
> correct is 771200
>        at 
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162)
>        at 
> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
>        at 
> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>        at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>        at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
>        at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
>        at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
>        at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>        at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>        at java.lang.Thread.run(Thread.java:724)
> 
> 
> And then, after about 2 minutes there are out of memory errors:
> 
> ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,630 CassandraDaemon.java 
> (line 192) Exception in thread Thread[CompactionExecutor
> :5,1,main]
> java.lang.OutOfMemoryError: unable to create new native thread
>        at java.lang.Thread.start0(Native Method)
>        at java.lang.Thread.start(Thread.java:693)
>        at 
> org.apache.cassandra.db.compaction.ParallelCompactionIterable$Deserializer.<init>(ParallelCompactionIterable.java:296)
>        at 
> org.apache.cassandra.db.compaction.ParallelCompactionIterable.iterator(ParallelCompactionIterable.java:73)
>        at 
> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:120)
>        at 
> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>        at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>        at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
>        at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
>        at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
>        at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>        at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>        at java.lang.Thread.run(Thread.java:724)
> ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,685 CassandraDaemon.java 
> (line 192) Exception in thread Thread[CompactionExecutor:
> 
> 
> On the other hosts the log looks similar, but these keep running, desipte the 
> OutOfMemory Errors.
> 
> 
> 
> 
> Jan
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>> 
>> 
>> On Wed, Sep 4, 2013 at 9:44 AM, Jan Algermissen <jan.algermis...@nordsc.com> 
>> wrote:
>> Hi,
>> 
>> I have set up C* in a very limited environment: 3 VMs at digitalocean with 
>> 2GB RAM and 40GB SSDs, so my expectations about overall performance are low.
>> 
>> Keyspace uses replication level of 2.
>> 
>> I am loading 1.5 Mio rows (each 60 columns of a mix of numbers and small 
>> texts, 300.000 wide rows effektively) in a quite 'agressive' way, using 
>> java-driver and async update statements.
>> 
>> After a while of importing data, I start seeing timeouts reported by the 
>> driver:
>> 
>> com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout 
>> during write query at consistency ONE (1 replica were required but only 0 
>> acknowledged the write
>> 
>> and then later, host-unavailability exceptions:
>> 
>> com.datastax.driver.core.exceptions.UnavailableException: Not enough replica 
>> available for query at consistency ONE (1 required but only 0 alive).
>> 
>> Looking at the 3 hosts, I see two C*s went down - which explains that I 
>> still see some writes succeeding (that must be the one host left, satisfying 
>> the consitency level ONE).
>> 
>> 
>> The logs tell me AFAIU that the servers shutdown due to reaching the heap 
>> size limit.
>> 
>> I am irritated by the fact that the instances (it seems) shut themselves 
>> down instead of limiting their amount of work. I understand that I need to 
>> tweak the configuration and likely get more RAM, but still, I would actually 
>> be satisfied with reduced service (and likely more timeouts in the client).  
>> Right now it looks as if I would have to slow down the client 'artificially' 
>>  to prevent the loss of hosts - does that make sense?
>> 
>> Can anyone explain whether this is intended behavior, meaning I'll just have 
>> to accept the self-shutdown of the hosts? Or alternatively, what data I 
>> should collect to investigate the cause further?
>> 
>> Jan
>> 
>> 
>> 
>> 
>> 
>> 
>

Re: Cassandra crashes - solved

Reply via email to