Hi all. Cassandra constantly OOM on repair or compaction. Increasing memory doesn't help (6G) I can give more, but I think that this is not a regular situation. Cluster has 4 nodes. RF=3. Cassandra version 0.8.1
Ring looks like this: Address DC Rack Status State Load Owns Token 127605887595351923798765477786913079296 xxx.xxx.xxx.66 datacenter1 rack1 Up Normal 176.96 GB 25.00% 0 xxx.xxx.xxx.69 datacenter1 rack1 Up Normal 178.19 GB 25.00% 42535295865117307932921825928971026432 xxx.xxx.xxx.67 datacenter1 rack1 Up Normal 178.26 GB 25.00% 85070591730234615865843651857942052864 xxx.xxx.xxx.68 datacenter1 rack1 Up Normal 175.2 GB 25.00% 127605887595351923798765477786913079296 About schema: I have big rows (>100k, up to several millions). But as I know, it is normal for cassandra. All things work relatively good, until I start long running pre-production tests. I load data and after a while (~4hours) cluster begin timeout and them some nodes die with OOM. My app retries to send, so after short period all nodes becomes down. Very nasty. But now, I can OOM nodes by simple call nodetool repair. In logs http://paste.kde.org/96811/ it is clear, how heap rocketjump to upper limit. cfstats shows: http://paste.kde.org/96817/ config is: http://paste.kde.org/96823/ A question is: does anybody knows, what this means. Why cassandra tries to load something big into memory at once? A.