Cassandra OOM on repair.

Andrey Stepachev Fri, 15 Jul 2011 01:22:07 -0700

Hi all.

Cassandra constantly OOM on repair or compaction. Increasing memory doesn't
help (6G)
I can give more, but I think that this is not a regular situation. Cluster
has 4 nodes. RF=3.
Cassandra version 0.8.1


Ring looks like this:
 Address         DC          Rack        Status State   Load            Owns
   Token

   127605887595351923798765477786913079296
xxx.xxx.xxx.66  datacenter1 rack1       Up     Normal  176.96 GB
25.00%  0
xxx.xxx.xxx.69  datacenter1 rack1       Up     Normal  178.19 GB
25.00%  42535295865117307932921825928971026432
xxx.xxx.xxx.67  datacenter1 rack1       Up     Normal  178.26 GB
25.00%  85070591730234615865843651857942052864
xxx.xxx.xxx.68  datacenter1 rack1       Up     Normal  175.2 GB
 25.00%  127605887595351923798765477786913079296

About schema:
I have big rows (>100k, up to several millions). But as I know, it is normal
for cassandra.
All things work relatively good, until I start long running pre-production
tests. I load
data and after a while (~4hours) cluster begin timeout and them some nodes
die with OOM.
My app retries to send, so after short period all nodes becomes down. Very
nasty.

But now, I can OOM nodes by simple call nodetool repair.
In logs http://paste.kde.org/96811/ it is clear, how heap rocketjump to
upper limit.
cfstats shows: http://paste.kde.org/96817/
config is: http://paste.kde.org/96823/
A question is: does anybody knows, what this means. Why cassandra tries to
load
something big into memory at once?

A.

Cassandra OOM on repair.

Reply via email to