Hi, We are running a small 9 node Cassandra v2.1.17 cluster. The cluster generally runs fine, but we have one table that are causing OOMs because an enormous amount of tombstones. Looking at the data in the table (sstable2json), the first of the tombstones are almost a year old. The table was initially created with a gc_grace_period of 10 days, but I have now lowered it to 1 hour. I have run a full repair of the table across all nodes. I have forced several major compactions of the table by using "nodetool compact", and also tried to switch from LeveledCompaction to SizeTierCompaction and back.
What could cause cassandra to keep these tombstones? sstable2json: {"key": "foo", "cells": [["0000082f-25ef-4324-bb8a-8cf013c823c1:_","0000082f-25ef-4324-bb8a-8cf013c823c1:!",1507819135148000,"t",1507819135], ["000010f3-c05d-4ab9-9b8a-e6ebd8f5818a:_","000010f3-c05d-4ab9-9b8a-e6ebd8f5818a:!",1503661731697000,"t",1503661731], ["00001d7a-ce95-4c74-b67e-f8cdffec4f85:_","00001d7a-ce95-4c74-b67e-f8cdffec4f85:!",1509542102909000,"t",1509542102], ["00001dd3-ae22-4f6e-944a-8cfa147cde68:_","00001dd3-ae22-4f6e-944a-8cfa147cde68:!",1512418006838000,"t",1512418006], ["000022cc-d69c-4596-89e5-3e976c0cb9a8:_","000022cc-d69c-4596-89e5-3e976c0cb9a8:!",1497377448737001,"t",1497377448], ["00002777-4b1a-4267-8efc-c43054e63170:_","00002777-4b1a-4267-8efc-c43054e63170:!",1491014691515001,"t",1491014691], ["000061e8-f48b-4484-96f1-f8b6a3ed8f9f:_","000061e8-f48b-4484-96f1-f8b6a3ed8f9f:!",1500820300544000,"t",1500820300], ["000063da-f165-449b-b65d-2b7869368734:_","000063da-f165-449b-b65d-2b7869368734:!",1512806634968000,"t",1512806634], ["0000656f-f8b5-472b-93ed-1a893002f027:_","0000656f-f8b5-472b-93ed-1a893002f027:!",1514554716141000,"t",1514554716], ... {"key": "bar", "metadata": {"deletionInfo": {"markedForDeleteAt":1517402198585982,"localDeletionTime":1517402198}}, "cells": [["000af8c2-ffe9-4217-9032-61a1cd21781d:_","000af8c2-ffe9-4217-9032-61a1cd21781d:!",1495094965916000,"t",1495094965], ["005b96cb-7eb3-4ec3-bfa2-8573e46892f4:_","005b96cb-7eb3-4ec3-bfa2-8573e46892f4:!",1516360186865000,"t",1516360186], ["005ec167-aa61-4868-a3ae-a44b00099eb6:_","005ec167-aa61-4868-a3ae-a44b00099eb6:!",1516671840920002,"t",1516671840], .... sstablemetadata: stablemetadata /data/cassandra/data/xxx/yyy-9ed502c0734011e6a128fdafd829b1c6/ddp-yyy-ka-2741-Data.db SSTable: /data/cassandra/data/xxx/yyy-9ed502c0734011e6a128fdafd829b1c6/ddp-yyy-ka-2741 Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Bloom Filter FP chance: 0.100000 Minimum timestamp: 1488976211688000 Maximum timestamp: 1517468644066000 SSTable max local deletion time: 2147483647 Compression ratio: 0.5121956624389545 Estimated droppable tombstones: 18.00161766553587 SSTable Level: 0 Repaired at: 0 ReplayPosition(segmentId=1517168739626, position=22690189) Estimated tombstone drop times:%n 1488976211: 1 1489906506: 4706 1490174752: 6111 1490449759: 6554 1490735410: 6559 1491016789: 6369 1491347982: 10216 1491680214: 13502 ... desc: CREATE TABLE xxx.yyy ( ti text, uuid text, json_data text, PRIMARY KEY (ti, uuid) ) WITH CLUSTERING ORDER BY (uuid ASC) AND bloom_filter_fp_chance = 0.1 AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 3600 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE'; jmx props(picture): [image: image.png]