Maybe you should include the end of Cassandra logs. What comes to my mind when I read your first post is OOM killer. But what you describe later is not the case. Just to be sure, have you checked /var/log/messages?
Romain De : Jan Algermissen <jan.algermis...@nordsc.com> A : user@cassandra.apache.org, Date : 04/09/2013 10:52 Objet : Re: Cassandra shuts down; was:Cassandra crashes The subject line isn't appropriate - the servers do not crash but shut down. Since the log messages appear several lines before the end of the log file, I only saw afterwards. Excuse the confusion. Jan On 04.09.2013, at 10:44, Jan Algermissen <jan.algermis...@nordsc.com> wrote: > Hi, > > I have set up C* in a very limited environment: 3 VMs at digitalocean with 2GB RAM and 40GB SSDs, so my expectations about overall performance are low. > > Keyspace uses replication level of 2. > > I am loading 1.5 Mio rows (each 60 columns of a mix of numbers and small texts, 300.000 wide rows effektively) in a quite 'agressive' way, using java-driver and async update statements. > > After a while of importing data, I start seeing timeouts reported by the driver: > > com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write > > and then later, host-unavailability exceptions: > > com.datastax.driver.core.exceptions.UnavailableException: Not enough replica available for query at consistency ONE (1 required but only 0 alive). > > Looking at the 3 hosts, I see two C*s went down - which explains that I still see some writes succeeding (that must be the one host left, satisfying the consitency level ONE). > > > The logs tell me AFAIU that the servers shutdown due to reaching the heap size limit. > > I am irritated by the fact that the instances (it seems) shut themselves down instead of limiting their amount of work. I understand that I need to tweak the configuration and likely get more RAM, but still, I would actually be satisfied with reduced service (and likely more timeouts in the client). Right now it looks as if I would have to slow down the client 'artificially' to prevent the loss of hosts - does that make sense? > > Can anyone explain whether this is intended behavior, meaning I'll just have to accept the self-shutdown of the hosts? Or alternatively, what data I should collect to investigate the cause further? > > Jan > > > > >