On 06.09.2013, at 17:07, Jan Algermissen <jan.algermis...@nordsc.com> wrote:
> > On 06.09.2013, at 13:12, Alex Major <al3...@gmail.com> wrote: > >> Have you changed the appropriate config settings so that Cassandra will run >> with only 2GB RAM? You shouldn't find the nodes go down. >> >> Check out this blog post >> http://www.opensourceconnections.com/2013/08/31/building-the-perfect-cassandra-test-environment/ >> , it outlines the configuration settings needed to run Cassandra on 64MB >> RAM and might give you some insights. > > Yes, I have my fingers on the knobs and have also seen the article you > mention - very helpful indeed. As well as the replies so far. Thanks very > much. > > However, I still manage to kill 2 or 3 nodes of my 3-node cluster with my > data import :-( The problem for me was in_memory_compaction_limit_in_mb: 1 it seems that the combination of my rather large rows (70 cols each) in combination with the slower two-pass compaction process mentioned in the comment of the config switch caused the "java.lang.AssertionError: incorrect row data size" exceptions. After turning in_memory_compaction_limit_in_mb back to 64 all I am getting are write tmeouts. AFAIU that is fine because now C* is stable and i all have is a capacity problem solvable with more nodes or more RAM (maybe, depends on whether IO is an issue). Jan > > Now, while it would be easy to scale out and up a bit until the default > config of C* is sufficient, I really like to dive deep and try to understand > why the thing is still going down, IOW, which of my config settings is so > darn wrong that in most cases kill -9 remains the only way to shutdown the > Java process in the end. > > > The problem seems to be the heap size (set to MAX_HEAP_SIZE="640M" and > HEAP_NEWSIZE="120M" ) in combination with some cassandra activity that > demands too much heap, right? > > So how do I find out what activity this is and how do I sufficiently reduce > that activity. > > What bugs me in general is that AFAIU C* is so eager at giving massive write > speed, that it sort of forgets to protect itself from client demand. I would > very much like to understand why and how that happens. I mean: no matter how > many clients are flooding the database, it should not die due to out of > memory situations, regardless of any configuration specifics, or? > > > tl;dr > > Currently my client side (with java-driver) after a while reports more and > more timeouts and then the following exception: > > com.datastax.driver.core.ex > ceptions.DriverInternalError: An unexpected error occured server side: > java.lang.OutOfMemoryError: unable > to create new native thread ; > > On the server side, my cluster remains more or less in this condition: > > DN xxxxx 71,33 MB 256 34,1% 2f5e0b70-dbf4-4f37-8d5e-746ab76efbae > rack1 > UN xxxxx 189,38 MB 256 32,0% e6d95136-f102-49ce-81ea-72bd6a52ec5f > rack1 > UN xxxxx 198,49 MB 256 33,9% 0c2931a9-6582-48f2-b65a-e406e0bf1e56 > rack1 > > The host that is down (it is the seed host, if that matters) still shows the > running java process, but I cannot shut down cassandra or connect with > nodetool, hence kill -9 to the rescue. > > In that host, I still see a load of around 1. > > jstack -F lists 892 threads, all blocked, except for 5 inactive ones. > > > The system.log after a few seconds of import shows the following exception: > > java.lang.AssertionError: incorrect row data size 771030 written to > /var/lib/cassandra/data/products/product/products-product-tmp-ic-6-Data.db; > correct is 771200 > at > org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162) > at > org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162) > at > org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) > at > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > > > And then, after about 2 minutes there are out of memory errors: > > ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,630 CassandraDaemon.java > (line 192) Exception in thread Thread[CompactionExecutor > :5,1,main] > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:693) > at > org.apache.cassandra.db.compaction.ParallelCompactionIterable$Deserializer.<init>(ParallelCompactionIterable.java:296) > at > org.apache.cassandra.db.compaction.ParallelCompactionIterable.iterator(ParallelCompactionIterable.java:73) > at > org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:120) > at > org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) > at > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,685 CassandraDaemon.java > (line 192) Exception in thread Thread[CompactionExecutor: > > > On the other hosts the log looks similar, but these keep running, desipte the > OutOfMemory Errors. > > > > > Jan > > > > > > > > > > > > > >> >> >> On Wed, Sep 4, 2013 at 9:44 AM, Jan Algermissen <jan.algermis...@nordsc.com> >> wrote: >> Hi, >> >> I have set up C* in a very limited environment: 3 VMs at digitalocean with >> 2GB RAM and 40GB SSDs, so my expectations about overall performance are low. >> >> Keyspace uses replication level of 2. >> >> I am loading 1.5 Mio rows (each 60 columns of a mix of numbers and small >> texts, 300.000 wide rows effektively) in a quite 'agressive' way, using >> java-driver and async update statements. >> >> After a while of importing data, I start seeing timeouts reported by the >> driver: >> >> com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout >> during write query at consistency ONE (1 replica were required but only 0 >> acknowledged the write >> >> and then later, host-unavailability exceptions: >> >> com.datastax.driver.core.exceptions.UnavailableException: Not enough replica >> available for query at consistency ONE (1 required but only 0 alive). >> >> Looking at the 3 hosts, I see two C*s went down - which explains that I >> still see some writes succeeding (that must be the one host left, satisfying >> the consitency level ONE). >> >> >> The logs tell me AFAIU that the servers shutdown due to reaching the heap >> size limit. >> >> I am irritated by the fact that the instances (it seems) shut themselves >> down instead of limiting their amount of work. I understand that I need to >> tweak the configuration and likely get more RAM, but still, I would actually >> be satisfied with reduced service (and likely more timeouts in the client). >> Right now it looks as if I would have to slow down the client 'artificially' >> to prevent the loss of hosts - does that make sense? >> >> Can anyone explain whether this is intended behavior, meaning I'll just have >> to accept the self-shutdown of the hosts? Or alternatively, what data I >> should collect to investigate the cause further? >> >> Jan >> >> >> >> >> >> >