When you force compacted, did you end up with 1 sstable or 2? If 2, did you ever run (incremental) repair on some of the data? If so, it moves the repaired sstable to a different compaction manager, which means it won’t purge the tombstone if it shadows data in the unrepaired set
-- Jeff Jirsa > On Feb 12, 2018, at 12:46 AM, Bo Finnerup Madsen <bo.gunder...@gmail.com> > wrote: > > Well for anyone having the same issue, I "fixed" it by dropping and > re-creating the table. > >> fre. 2. feb. 2018 kl. 07.29 skrev Steinmaurer, Thomas >> <thomas.steinmau...@dynatrace.com>: >> Right. In this case, cleanup should have done the necessary work here. >> >> >> >> Thomas >> >> >> >> From: Bo Finnerup Madsen [mailto:bo.gunder...@gmail.com] >> Sent: Freitag, 02. Februar 2018 06:59 >> >> >> To: user@cassandra.apache.org >> Subject: Re: Old tombstones not being cleaned up >> >> >> >> We did start with a 3 node cluster and a RF of 3, then added another 3 nodes >> and again another 3 nodes. So it is a good guess :) >> >> But I have run both repair and cleanup against the table on all nodes, would >> that not have removed any stray partitions? >> >> tor. 1. feb. 2018 kl. 22.31 skrev Steinmaurer, Thomas >> <thomas.steinmau...@dynatrace.com>: >> >> Did you started with a 9 node cluster from the beginning or did you extend / >> scale out your cluster (with vnodes) beyond the replication factor? >> >> >> >> If later applies and if you are deleting by explicit deletes and not via >> TTL, then nodes might not see the deletes anymore, as a node might not own >> the partition anymore after a topology change (e.g. scale out beyond the >> keyspace RF). >> >> >> >> Just a very wild guess. >> >> >> >> Thomas >> >> >> >> From: Bo Finnerup Madsen [mailto:bo.gunder...@gmail.com] >> Sent: Donnerstag, 01. Februar 2018 22:14 >> >> >> To: user@cassandra.apache.org >> Subject: Re: Old tombstones not being cleaned up >> >> >> >> We do not use TTL anywhere...records are inserted and deleted "manually" by >> our software. >> >> tor. 1. feb. 2018 kl. 18.29 skrev Jonathan Haddad <j...@jonhaddad.com>: >> >> Changing the defaul TTL doesn’t change the TTL on the existing data, only >> new data. It’s only set if you don’t supply one yourself. >> >> >> >> On Wed, Jan 31, 2018 at 11:35 PM Bo Finnerup Madsen <bo.gunder...@gmail.com> >> wrote: >> >> Hi, >> >> >> >> We are running a small 9 node Cassandra v2.1.17 cluster. The cluster >> generally runs fine, but we have one table that are causing OOMs because an >> enormous amount of tombstones. >> >> Looking at the data in the table (sstable2json), the first of the tombstones >> are almost a year old. The table was initially created with a >> gc_grace_period of 10 days, but I have now lowered it to 1 hour. >> >> I have run a full repair of the table across all nodes. I have forced >> several major compactions of the table by using "nodetool compact", and also >> tried to switch from LeveledCompaction to SizeTierCompaction and back. >> >> >> >> What could cause cassandra to keep these tombstones? >> >> >> >> sstable2json: >> >> {"key": "foo", >> >> "cells": >> [["0000082f-25ef-4324-bb8a-8cf013c823c1:_","0000082f-25ef-4324-bb8a-8cf013c823c1:!",1507819135148000,"t",1507819135], >> >> >> ["000010f3-c05d-4ab9-9b8a-e6ebd8f5818a:_","000010f3-c05d-4ab9-9b8a-e6ebd8f5818a:!",1503661731697000,"t",1503661731], >> >> >> ["00001d7a-ce95-4c74-b67e-f8cdffec4f85:_","00001d7a-ce95-4c74-b67e-f8cdffec4f85:!",1509542102909000,"t",1509542102], >> >> >> ["00001dd3-ae22-4f6e-944a-8cfa147cde68:_","00001dd3-ae22-4f6e-944a-8cfa147cde68:!",1512418006838000,"t",1512418006], >> >> >> ["000022cc-d69c-4596-89e5-3e976c0cb9a8:_","000022cc-d69c-4596-89e5-3e976c0cb9a8:!",1497377448737001,"t",1497377448], >> >> >> ["00002777-4b1a-4267-8efc-c43054e63170:_","00002777-4b1a-4267-8efc-c43054e63170:!",1491014691515001,"t",1491014691], >> >> >> ["000061e8-f48b-4484-96f1-f8b6a3ed8f9f:_","000061e8-f48b-4484-96f1-f8b6a3ed8f9f:!",1500820300544000,"t",1500820300], >> >> >> ["000063da-f165-449b-b65d-2b7869368734:_","000063da-f165-449b-b65d-2b7869368734:!",1512806634968000,"t",1512806634], >> >> >> ["0000656f-f8b5-472b-93ed-1a893002f027:_","0000656f-f8b5-472b-93ed-1a893002f027:!",1514554716141000,"t",1514554716], >> >> ... >> >> {"key": "bar", >> >> "metadata": {"deletionInfo": >> {"markedForDeleteAt":1517402198585982,"localDeletionTime":1517402198}}, >> >> "cells": >> [["000af8c2-ffe9-4217-9032-61a1cd21781d:_","000af8c2-ffe9-4217-9032-61a1cd21781d:!",1495094965916000,"t",1495094965], >> >> >> ["005b96cb-7eb3-4ec3-bfa2-8573e46892f4:_","005b96cb-7eb3-4ec3-bfa2-8573e46892f4:!",1516360186865000,"t",1516360186], >> >> >> ["005ec167-aa61-4868-a3ae-a44b00099eb6:_","005ec167-aa61-4868-a3ae-a44b00099eb6:!",1516671840920002,"t",1516671840], >> >> .... >> >> >> >> sstablemetadata: >> >> stablemetadata >> /data/cassandra/data/xxx/yyy-9ed502c0734011e6a128fdafd829b1c6/ddp-yyy-ka-2741-Data.db >> >> SSTable: >> /data/cassandra/data/xxx/yyy-9ed502c0734011e6a128fdafd829b1c6/ddp-yyy-ka-2741 >> >> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner >> >> Bloom Filter FP chance: 0.100000 >> >> Minimum timestamp: 1488976211688000 >> >> Maximum timestamp: 1517468644066000 >> >> SSTable max local deletion time: 2147483647 >> >> Compression ratio: 0.5121956624389545 >> >> Estimated droppable tombstones: 18.00161766553587 >> >> SSTable Level: 0 >> >> Repaired at: 0 >> >> ReplayPosition(segmentId=1517168739626, position=22690189) >> >> Estimated tombstone drop times:%n >> >> 1488976211: 1 >> >> 1489906506: 4706 >> >> 1490174752: 6111 >> >> 1490449759: 6554 >> >> 1490735410: 6559 >> >> 1491016789: 6369 >> >> 1491347982: 10216 >> >> 1491680214: 13502 >> >> ... >> >> >> >> desc: >> >> CREATE TABLE xxx.yyy ( >> >> ti text, >> >> uuid text, >> >> json_data text, >> >> PRIMARY KEY (ti, uuid) >> >> ) WITH CLUSTERING ORDER BY (uuid ASC) >> >> AND bloom_filter_fp_chance = 0.1 >> >> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' >> >> AND comment = '' >> >> AND compaction = {'class': >> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} >> >> AND compression = {'sstable_compression': >> 'org.apache.cassandra.io.compress.LZ4Compressor'} >> >> AND dclocal_read_repair_chance = 0.1 >> >> AND default_time_to_live = 0 >> >> AND gc_grace_seconds = 3600 >> >> AND max_index_interval = 2048 >> >> AND memtable_flush_period_in_ms = 0 >> >> AND min_index_interval = 128 >> >> AND read_repair_chance = 0.0 >> >> AND speculative_retry = '99.0PERCENTILE'; >> >> >> >> jmx props(picture): >> >> >> >> The contents of this e-mail are intended for the named addressee only. It >> contains information that may be confidential. Unless you are the named >> addressee or an authorized designee, you may not copy or use it, or disclose >> it to anyone else. If you received it in error please notify us immediately >> and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) >> is a company registered in Linz whose registered office is at 4040 Linz, >> Austria, Freistädterstraße 313 >> >> The contents of this e-mail are intended for the named addressee only. It >> contains information that may be confidential. Unless you are the named >> addressee or an authorized designee, you may not copy or use it, or disclose >> it to anyone else. If you received it in error please notify us immediately >> and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) >> is a company registered in Linz whose registered office is at 4040 Linz, >> Austria, Freistädterstraße 313