I am not familiar with that part of the code yet. But what if the gc_grace was changed to a lower value as part of a schema migration after the hints have been marked with TTLs equal to the lower gc_grace before the migration?
>From what you've described, I think this is not an issue for us as we did not have a node down for a long period of time, but just pointing out what I think could happen based on what you've described. On Sun, Mar 24, 2013 at 10:03 AM, aaron morton <aa...@thelastpickle.com>wrote: > I could imagine a scenario where a hint was replayed to a replica after > all replicas had purged their tombstones > > Scratch that, the hints are TTL'd with the lowest gc_grace. > Ticket closed https://issues.apache.org/jira/browse/CASSANDRA-5379 > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 24/03/2013, at 6:24 AM, aaron morton <aa...@thelastpickle.com> wrote: > > Beside the joke, would hinted handoff really have any role in this issue? > > I could imagine a scenario where a hint was replayed to a replica after > all replicas had purged their tombstones. That seems like a long shot, it > would need one node to be down for the write and all up for the delete and > for all of them to have purged the tombstone. But maybe we should have a > max age on hints so it cannot happen. > > Created https://issues.apache.org/jira/browse/CASSANDRA-5379 > > Ensuring no hints are in place during an upgrade would work around. I tend > to make sure hints and commit log are clear during an upgrade. > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 22/03/2013, at 7:54 AM, Arya Goudarzi <gouda...@gmail.com> wrote: > > Beside the joke, would hinted handoff really have any role in this issue? > I have been struggling to reproduce this issue using the snapshot data > taken from our cluster and following the same upgrade process from 1.1.6 to > 1.1.10. I know snapshots only link to active SSTables. What if these > returned rows belong to some inactive SSTables and some bug exposed itself > and marked them as active? What are the possibilities that could lead to > this? I am eager to find our as this is blocking our upgrade. > > On Tue, Mar 19, 2013 at 2:11 AM, <moshe.kr...@barclays.com> wrote: > >> This obscure feature of Cassandra is called “haunted handoff”.**** >> >> ** ** >> >> Happy (early) April Fools J**** >> >> ** ** >> >> *From:* aaron morton [mailto:aa...@thelastpickle.com] >> *Sent:* Monday, March 18, 2013 7:45 PM >> *To:* user@cassandra.apache.org >> *Subject:* Re: Lots of Deleted Rows Came back after upgrade 1.1.6 to >> 1.1.10**** >> >> ** ** >> >> As you see, this node thinks lots of ranges are out of sync which >> shouldn't be the case as successful repairs where done every night prior to >> the upgrade. **** >> >> Could this be explained by writes occurring during the upgrade process ? >> **** >> >> ** ** >> >> I found this bug which touches timestamp and tomstones which was fixed in >> 1.1.10 but am not 100% sure if it could be related to this issue: >> https://issues.apache.org/jira/browse/CASSANDRA-5153**** >> >> Me neither, but the issue was fixed in 1.1.0**** >> >> ** ** >> >> It appears that the repair task that I executed after upgrade, brought >> back lots of deleted rows into life.**** >> >> Was it entire rows or columns in a row?**** >> >> Do you know if row level or column level deletes were used ? **** >> >> ** ** >> >> Can you look at the data in cassanca-cli and confirm the timestamps on >> the columns make sense ? **** >> >> ** ** >> >> Cheers**** >> >> ** ** >> >> -----------------**** >> >> Aaron Morton**** >> >> Freelance Cassandra Consultant**** >> >> New Zealand**** >> >> ** ** >> >> @aaronmorton**** >> >> http://www.thelastpickle.com**** >> >> ** ** >> >> On 16/03/2013, at 2:31 PM, Arya Goudarzi <gouda...@gmail.com> wrote:**** >> >> >> >> **** >> >> Hi,**** >> >> ** ** >> >> I have upgraded our test cluster from 1.1.6 to 1.1.10. Followed by >> running repairs. It appears that the repair task that I executed after >> upgrade, brought back lots of deleted rows into life. Here are some >> logistics:**** >> >> ** ** >> >> - The upgraded cluster started from 1.1.1 -> 1.1.2 -> 1.1.5 -> 1.1.6 **** >> >> - Old cluster: 4 node, C* 1.1.6 with RF3 using NetworkTopology;**** >> >> - Upgrade to : 1.1.10 with all other settings the same;**** >> >> - Successful repairs were being done on this cluster every night;**** >> >> - Our clients use nanosecond precision timestamp for cassandra calls;**** >> >> - After upgrade, while running repair I say some log messages like this >> in one node:**** >> >> ** ** >> >> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:54,847 >> AntiEntropyService.java (line 1022) [repair >> #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.194.60 and / >> 23.20.207.56 have 2223 range(s) out of sync for App**** >> >> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:54,877 >> AntiEntropyService.java (line 1022) [repair >> #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.250.43 and / >> 23.20.207.56 have 161 range(s) out of sync for App**** >> >> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:55,097 >> AntiEntropyService.java (line 1022) [repair >> #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.194.60 and / >> 23.20.250.43 have 2294 range(s) out of sync for App**** >> >> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:59,190 >> AntiEntropyService.java (line 789) [repair >> #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] App is fully synced (13 remaining >> column family to sync for this session)**** >> >> ** ** >> >> As you see, this node thinks lots of ranges are out of sync which >> shouldn't be the case as successful repairs where done every night prior to >> the upgrade. **** >> >> ** ** >> >> The App CF uses SizeTiered with gc_grace of 10 days. It has caching = >> 'ALL', and it is fairly small (11Mb on each node).**** >> >> ** ** >> >> I found this bug which touches timestamp and tomstones which was fixed in >> 1.1.10 but am not 100% sure if it could be related to this issue: >> https://issues.apache.org/jira/browse/CASSANDRA-5153**** >> >> ** ** >> >> Any advice on how to dig deeper into this would be appreciated.**** >> >> ** ** >> >> Thanks,**** >> >> -Arya**** >> >> ** ** >> >> ** ** >> >> ** ** >> >> _______________________________________________ >> >> This message may contain information that is confidential or privileged. >> If you are not an intended recipient of this message, please delete it and >> any attachments, and notify the sender that you have received it in error. >> Unless specifically stated in the message or otherwise indicated, you may >> not duplicate, redistribute or forward this message or any portion thereof, >> including any attachments, by any means to any other person, including any >> retail investor or customer. This message is not a recommendation, advice, >> offer or solicitation, to buy/sell any product or service, and is not an >> official confirmation of any transaction. Any opinions presented are solely >> those of the author and do not necessarily represent those of Barclays. >> This message is subject to terms available at: >> www.barclays.com/emaildisclaimer and, if received from Barclays' Sales >> or Trading desk, the terms available at: >> www.barclays.com/salesandtradingdisclaimer/. By messaging with Barclays >> you consent to the foregoing. Barclays Bank PLC is a company registered in >> England (number 1026167) with its registered office at 1 Churchill Place, >> London, E14 5HP. This email may relate to or be sent from other members of >> the Barclays group. >> >> _______________________________________________ >> > > > >