Typically you'll end up with an unrepaired SSTable and a repaired SSTable. You'll only end up with one if there's absolutely no unrepaired data (which is very unlikely).
On 13 February 2018 at 09:54, Bo Finnerup Madsen <bo.gunder...@gmail.com> wrote: > Hi Eric, > > I had not seen your talk, it was very informative thank you! :) > > Based on your talk, I can see how tombstones might noget get removed > during normal operations under certain conditions. But I am not sure our > scenario fit those conditions. > > We have less than 100.000 live rows in the table in question, and when > flushed the table is roughly 60Mb. Using "nodetool compact" I did several > full compactions of the table. How ever, I always ended up with two > sstables as Jeff mentions, so perhaps some kind of issue with the > incremental repair... > > > man. 12. feb. 2018 kl. 15.46 skrev Eric Stevens <migh...@gmail.com>: > >> Hi, >> >> Just in case you haven't seen it, I gave a talk last year at the summit. >> In the first part of the talk I speak for a while about the lifecycle of a >> tombstone, and how they don't always get cleaned up when you might expect. >> >> https://youtu.be/BhGkSnBZgJA >> >> It looks like you're deleting cluster keys on a partition that you always >> append to? If so those tombstones can never be cleaned up - see the talk. >> I don't know if this is what's affecting you or not, but it might be >> worthwhile to consider. >> >> On Mon, Feb 12, 2018, 3:17 AM Jeff Jirsa <jji...@gmail.com> wrote: >> >>> When you force compacted, did you end up with 1 sstable or 2? >>> >>> If 2, did you ever run (incremental) repair on some of the data? If so, >>> it moves the repaired sstable to a different compaction manager, which >>> means it won’t purge the tombstone if it shadows data in the unrepaired set >>> >>> >>> >>> -- >>> Jeff Jirsa >>> >>> >>> On Feb 12, 2018, at 12:46 AM, Bo Finnerup Madsen <bo.gunder...@gmail.com> >>> wrote: >>> >>> Well for anyone having the same issue, I "fixed" it by dropping and >>> re-creating the table. >>> >>> fre. 2. feb. 2018 kl. 07.29 skrev Steinmaurer, Thomas < >>> thomas.steinmau...@dynatrace.com>: >>> >>>> Right. In this case, cleanup should have done the necessary work here. >>>> >>>> >>>> >>>> Thomas >>>> >>>> >>>> >>>> *From:* Bo Finnerup Madsen [mailto:bo.gunder...@gmail.com] >>>> *Sent:* Freitag, 02. Februar 2018 06:59 >>>> >>>> >>>> *To:* user@cassandra.apache.org >>>> *Subject:* Re: Old tombstones not being cleaned up >>>> >>>> >>>> >>>> We did start with a 3 node cluster and a RF of 3, then added another 3 >>>> nodes and again another 3 nodes. So it is a good guess :) >>>> >>>> But I have run both repair and cleanup against the table on all nodes, >>>> would that not have removed any stray partitions? >>>> >>>> tor. 1. feb. 2018 kl. 22.31 skrev Steinmaurer, Thomas < >>>> thomas.steinmau...@dynatrace.com>: >>>> >>>> Did you started with a 9 node cluster from the beginning or did you >>>> extend / scale out your cluster (with vnodes) beyond the replication >>>> factor? >>>> >>>> >>>> >>>> If later applies and if you are deleting by explicit deletes and not >>>> via TTL, then nodes might not see the deletes anymore, as a node might not >>>> own the partition anymore after a topology change (e.g. scale out beyond >>>> the keyspace RF). >>>> >>>> >>>> >>>> Just a very wild guess. >>>> >>>> >>>> >>>> Thomas >>>> >>>> >>>> >>>> *From:* Bo Finnerup Madsen [mailto:bo.gunder...@gmail.com] >>>> *Sent:* Donnerstag, 01. Februar 2018 22:14 >>>> >>>> >>>> *To:* user@cassandra.apache.org >>>> *Subject:* Re: Old tombstones not being cleaned up >>>> >>>> >>>> >>>> We do not use TTL anywhere...records are inserted and deleted >>>> "manually" by our software. >>>> >>>> tor. 1. feb. 2018 kl. 18.29 skrev Jonathan Haddad <j...@jonhaddad.com>: >>>> >>>> Changing the defaul TTL doesn’t change the TTL on the existing data, >>>> only new data. It’s only set if you don’t supply one yourself. >>>> >>>> >>>> >>>> On Wed, Jan 31, 2018 at 11:35 PM Bo Finnerup Madsen < >>>> bo.gunder...@gmail.com> wrote: >>>> >>>> Hi, >>>> >>>> >>>> >>>> We are running a small 9 node Cassandra v2.1.17 cluster. The cluster >>>> generally runs fine, but we have one table that are causing OOMs because an >>>> enormous amount of tombstones. >>>> >>>> Looking at the data in the table (sstable2json), the first of the >>>> tombstones are almost a year old. The table was initially created with a >>>> gc_grace_period of 10 days, but I have now lowered it to 1 hour. >>>> >>>> I have run a full repair of the table across all nodes. I have forced >>>> several major compactions of the table by using "nodetool compact", and >>>> also tried to switch from LeveledCompaction to SizeTierCompaction and back. >>>> >>>> >>>> >>>> What could cause cassandra to keep these tombstones? >>>> >>>> >>>> >>>> sstable2json: >>>> >>>> {"key": "foo", >>>> >>>> "cells": [["0000082f-25ef-4324-bb8a-8cf013c823c1:_","0000082f- >>>> 25ef-4324-bb8a-8cf013c823c1:!",1507819135148000,"t",1507819135], >>>> >>>> ["000010f3-c05d-4ab9-9b8a-e6ebd8f5818a:_","000010f3- >>>> c05d-4ab9-9b8a-e6ebd8f5818a:!",1503661731697000,"t",1503661731], >>>> >>>> ["00001d7a-ce95-4c74-b67e-f8cdffec4f85:_","00001d7a- >>>> ce95-4c74-b67e-f8cdffec4f85:!",1509542102909000,"t",1509542102], >>>> >>>> ["00001dd3-ae22-4f6e-944a-8cfa147cde68:_","00001dd3- >>>> ae22-4f6e-944a-8cfa147cde68:!",1512418006838000,"t",1512418006], >>>> >>>> ["000022cc-d69c-4596-89e5-3e976c0cb9a8:_","000022cc- >>>> d69c-4596-89e5-3e976c0cb9a8:!",1497377448737001,"t",1497377448], >>>> >>>> ["00002777-4b1a-4267-8efc-c43054e63170:_","00002777- >>>> 4b1a-4267-8efc-c43054e63170:!",1491014691515001,"t",1491014691], >>>> >>>> ["000061e8-f48b-4484-96f1-f8b6a3ed8f9f:_","000061e8- >>>> f48b-4484-96f1-f8b6a3ed8f9f:!",1500820300544000,"t",1500820300], >>>> >>>> ["000063da-f165-449b-b65d-2b7869368734:_","000063da- >>>> f165-449b-b65d-2b7869368734:!",1512806634968000,"t",1512806634], >>>> >>>> ["0000656f-f8b5-472b-93ed-1a893002f027:_","0000656f- >>>> f8b5-472b-93ed-1a893002f027:!",1514554716141000,"t",1514554716], >>>> >>>> ... >>>> >>>> {"key": "bar", >>>> >>>> "metadata": {"deletionInfo": {"markedForDeleteAt":1517402198585982," >>>> localDeletionTime":1517402198}}, >>>> >>>> "cells": [["000af8c2-ffe9-4217-9032-61a1cd21781d:_","000af8c2- >>>> ffe9-4217-9032-61a1cd21781d:!",1495094965916000,"t",1495094965], >>>> >>>> ["005b96cb-7eb3-4ec3-bfa2-8573e46892f4:_","005b96cb- >>>> 7eb3-4ec3-bfa2-8573e46892f4:!",1516360186865000,"t",1516360186], >>>> >>>> ["005ec167-aa61-4868-a3ae-a44b00099eb6:_","005ec167- >>>> aa61-4868-a3ae-a44b00099eb6:!",1516671840920002,"t",1516671840], >>>> >>>> .... >>>> >>>> >>>> >>>> sstablemetadata: >>>> >>>> stablemetadata /data/cassandra/data/xxx/yyy- >>>> 9ed502c0734011e6a128fdafd829b1c6/ddp-yyy-ka-2741-Data.db >>>> >>>> SSTable: /data/cassandra/data/xxx/yyy-9ed502c0734011e6a128fdafd829b1 >>>> c6/ddp-yyy-ka-2741 >>>> >>>> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner >>>> >>>> Bloom Filter FP chance: 0.100000 >>>> >>>> Minimum timestamp: 1488976211688000 >>>> >>>> Maximum timestamp: 1517468644066000 >>>> >>>> SSTable max local deletion time: 2147483647 <(214)%20748-3647> >>>> >>>> Compression ratio: 0.5121956624389545 >>>> >>>> Estimated droppable tombstones: 18.00161766553587 >>>> >>>> SSTable Level: 0 >>>> >>>> Repaired at: 0 >>>> >>>> ReplayPosition(segmentId=1517168739626, position=22690189 >>>> <22%2069%2001%2089>) >>>> >>>> Estimated tombstone drop times:%n >>>> >>>> 1488976211: 1 >>>> >>>> 1489906506: 4706 >>>> >>>> 1490174752: 6111 >>>> >>>> 1490449759: 6554 >>>> >>>> 1490735410: 6559 >>>> >>>> 1491016789: 6369 >>>> >>>> 1491347982: 10216 >>>> >>>> 1491680214: 13502 >>>> >>>> ... >>>> >>>> >>>> >>>> desc: >>>> >>>> CREATE TABLE xxx.yyy ( >>>> >>>> ti text, >>>> >>>> uuid text, >>>> >>>> json_data text, >>>> >>>> PRIMARY KEY (ti, uuid) >>>> >>>> ) WITH CLUSTERING ORDER BY (uuid ASC) >>>> >>>> AND bloom_filter_fp_chance = 0.1 >>>> >>>> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' >>>> >>>> AND comment = '' >>>> >>>> AND compaction = {'class': 'org.apache.cassandra.db.compaction. >>>> LeveledCompactionStrategy'} >>>> >>>> AND compression = {'sstable_compression': 'org.apache.cassandra.io. >>>> compress.LZ4Compressor'} >>>> >>>> AND dclocal_read_repair_chance = 0.1 >>>> >>>> AND default_time_to_live = 0 >>>> >>>> AND gc_grace_seconds = 3600 >>>> >>>> AND max_index_interval = 2048 >>>> >>>> AND memtable_flush_period_in_ms = 0 >>>> >>>> AND min_index_interval = 128 >>>> >>>> AND read_repair_chance = 0.0 >>>> >>>> AND speculative_retry = '99.0PERCENTILE'; >>>> >>>> >>>> >>>> jmx props(picture): >>>> >>>> [image: image001.png] >>>> >>>> The contents of this e-mail are intended for the named addressee only. >>>> It contains information that may be confidential. Unless you are the named >>>> addressee or an authorized designee, you may not copy or use it, or >>>> disclose it to anyone else. If you received it in error please notify us >>>> immediately and then destroy it. Dynatrace Austria GmbH (registration >>>> number FN 91482h) is a company registered in Linz whose registered office >>>> is at 4040 Linz, Austria, Freistädterstraße 313 >>>> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313&entry=gmail&source=g> >>>> >>>> The contents of this e-mail are intended for the named addressee only. >>>> It contains information that may be confidential. Unless you are the named >>>> addressee or an authorized designee, you may not copy or use it, or >>>> disclose it to anyone else. If you received it in error please notify us >>>> immediately and then destroy it. Dynatrace Austria GmbH (registration >>>> number FN 91482h) is a company registered in Linz whose registered office >>>> is at 4040 Linz, Austria, Freistädterstraße 313 >>>> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313&entry=gmail&source=g> >>>> >>>