Re: Lots of Deleted Rows Came back after upgrade 1.1.6 to 1.1.10

aaron morton Sun, 31 Mar 2013 03:38:31 -0700

> But what if the gc_grace was changed to a lower value as part of a schema 
> migration after the hints have been marked with TTLs equal to the lower 
> gc_grace before the migration? 
There would be a chance then if the tombstones had been purged. 
Want to raise a ticket ?


Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/03/2013, at 2:58 AM, Arya Goudarzi <gouda...@gmail.com> wrote:

> I am not familiar with that part of the code yet. But what if the gc_grace 
> was changed to a lower value as part of a schema migration after the hints 
> have been marked with TTLs equal to the lower gc_grace before the migration? 
> 
> From what you've described, I think this is not an issue for us as we did not 
> have a node down for a long period of time, but just pointing out what I 
> think could happen based on what you've described.
> 
> On Sun, Mar 24, 2013 at 10:03 AM, aaron morton <aa...@thelastpickle.com> 
> wrote:
>> I could imagine a  scenario where a hint was replayed to a replica after all 
>> replicas had purged their tombstones
> Scratch that, the hints are TTL'd with the lowest gc_grace. 
> Ticket closed https://issues.apache.org/jira/browse/CASSANDRA-5379
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 24/03/2013, at 6:24 AM, aaron morton <aa...@thelastpickle.com> wrote:
> 
>>> Beside the joke, would hinted handoff really have any role in this issue?
>> I could imagine a  scenario where a hint was replayed to a replica after all 
>> replicas had purged their tombstones. That seems like a long shot, it would 
>> need one node to be down for the write and all up for the delete and for all 
>> of them to have purged the tombstone. But maybe we should have a max age on 
>> hints so it cannot happen. 
>> 
>> Created https://issues.apache.org/jira/browse/CASSANDRA-5379
>> 
>> Ensuring no hints are in place during an upgrade would work around. I tend 
>> to make sure hints and commit log are clear during an upgrade. 
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 22/03/2013, at 7:54 AM, Arya Goudarzi <gouda...@gmail.com> wrote:
>> 
>>> Beside the joke, would hinted handoff really have any role in this issue? I 
>>> have been struggling to reproduce this issue using the snapshot data taken 
>>> from our cluster and following the same upgrade process from 1.1.6 to 
>>> 1.1.10. I know snapshots only link to active SSTables. What if these 
>>> returned rows belong to some inactive SSTables and some bug exposed itself 
>>> and marked them as active? What are the possibilities that could lead to 
>>> this? I am eager to find our as this is blocking our upgrade.
>>> 
>>> On Tue, Mar 19, 2013 at 2:11 AM, <moshe.kr...@barclays.com> wrote:
>>> This obscure feature of Cassandra is called “haunted handoff”.
>>> 
>>>  
>>> 
>>> Happy (early) April Fools J
>>> 
>>>  
>>> 
>>> From: aaron morton [mailto:aa...@thelastpickle.com] 
>>> Sent: Monday, March 18, 2013 7:45 PM
>>> To: user@cassandra.apache.org
>>> Subject: Re: Lots of Deleted Rows Came back after upgrade 1.1.6 to 1.1.10
>>> 
>>>  
>>> 
>>> As you see, this node thinks lots of ranges are out of sync which shouldn't 
>>> be the case as successful repairs where done every night prior to the 
>>> upgrade. 
>>> 
>>> Could this be explained by writes occurring during the upgrade process ? 
>>> 
>>>  
>>> 
>>> I found this bug which touches timestamp and tomstones which was fixed in 
>>> 1.1.10 but am not 100% sure if it could be related to this issue: 
>>> https://issues.apache.org/jira/browse/CASSANDRA-5153
>>> 
>>> Me neither, but the issue was fixed in 1.1.0
>>> 
>>>  
>>> 
>>>  It appears that the repair task that I executed after upgrade, brought 
>>> back lots of deleted rows into life.
>>> 
>>> Was it entire rows or columns in a row?
>>> 
>>> Do you know if row level or column level deletes were used ? 
>>> 
>>>  
>>> 
>>> Can you look at the data in cassanca-cli and confirm the timestamps on the 
>>> columns make sense ?  
>>> 
>>>  
>>> 
>>> Cheers
>>> 
>>>  
>>> 
>>> -----------------
>>> 
>>> Aaron Morton
>>> 
>>> Freelance Cassandra Consultant
>>> 
>>> New Zealand
>>> 
>>>  
>>> 
>>> @aaronmorton
>>> 
>>> http://www.thelastpickle.com
>>> 
>>>  
>>> 
>>> On 16/03/2013, at 2:31 PM, Arya Goudarzi <gouda...@gmail.com> wrote:
>>> 
>>> 
>>> 
>>> 
>>> Hi,
>>> 
>>>  
>>> 
>>> I have upgraded our test cluster from 1.1.6 to 1.1.10. Followed by running 
>>> repairs. It appears that the repair task that I executed after upgrade, 
>>> brought back lots of deleted rows into life. Here are some logistics:
>>> 
>>>  
>>> 
>>> - The upgraded cluster started from 1.1.1 -> 1.1.2 -> 1.1.5 -> 1.1.6 
>>> 
>>> - Old cluster: 4 node, C* 1.1.6 with RF3 using NetworkTopology;
>>> 
>>> - Upgrade to : 1.1.10 with all other settings the same;
>>> 
>>> - Successful repairs were being done on this cluster every night;
>>> 
>>> - Our clients use nanosecond precision timestamp for cassandra calls;
>>> 
>>> - After upgrade, while running repair I say some log messages like this in 
>>> one node:
>>> 
>>>  
>>> 
>>> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:54,847 
>>> AntiEntropyService.java (line 1022) [repair 
>>> #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.194.60 and 
>>> /23.20.207.56 have 2223 range(s) out of sync for App
>>> 
>>> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:54,877 
>>> AntiEntropyService.java (line 1022) [repair 
>>> #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.250.43 and 
>>> /23.20.207.56 have 161 range(s) out of sync for App
>>> 
>>> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:55,097 
>>> AntiEntropyService.java (line 1022) [repair 
>>> #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.194.60 and 
>>> /23.20.250.43 have 2294 range(s) out of sync for App
>>> 
>>> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:59,190 
>>> AntiEntropyService.java (line 789) [repair 
>>> #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] App is fully synced (13 remaining 
>>> column family to sync for this session)
>>> 
>>>  
>>> 
>>> As you see, this node thinks lots of ranges are out of sync which shouldn't 
>>> be the case as successful repairs where done every night prior to the 
>>> upgrade. 
>>> 
>>>  
>>> 
>>> The App CF uses SizeTiered with gc_grace of 10 days. It has caching = 
>>> 'ALL', and it is fairly small (11Mb on each node).
>>> 
>>>  
>>> 
>>> I found this bug which touches timestamp and tomstones which was fixed in 
>>> 1.1.10 but am not 100% sure if it could be related to this issue: 
>>> https://issues.apache.org/jira/browse/CASSANDRA-5153
>>> 
>>>  
>>> 
>>> Any advice on how to dig deeper into this would be appreciated.
>>> 
>>>  
>>> 
>>> Thanks,
>>> 
>>> -Arya
>>> 
>>>  
>>> 
>>>  
>>> 
>>>  
>>> 
>>> _______________________________________________
>>> 
>>> This message may contain information that is confidential or privileged. If 
>>> you are not an intended recipient of this message, please delete it and any 
>>> attachments, and notify the sender that you have received it in error. 
>>> Unless specifically stated in the message or otherwise indicated, you may 
>>> not duplicate, redistribute or forward this message or any portion thereof, 
>>> including any attachments, by any means to any other person, including any 
>>> retail investor or customer. This message is not a recommendation, advice, 
>>> offer or solicitation, to buy/sell any product or service, and is not an 
>>> official confirmation of any transaction. Any opinions presented are solely 
>>> those of the author and do not necessarily represent those of Barclays. 
>>> This message is subject to terms available at: 
>>> www.barclays.com/emaildisclaimer and, if received from Barclays' Sales or 
>>> Trading desk, the terms available at: 
>>> www.barclays.com/salesandtradingdisclaimer/. By messaging with Barclays you 
>>> consent to the foregoing. Barclays Bank PLC is a company registered in 
>>> England (number 1026167) with its registered office at 1 Churchill Place, 
>>> London, E14 5HP. This email may relate to or be sent from other members of 
>>> the Barclays group.
>>> 
>>> _______________________________________________
>>> 
>>> 
>> 
> 
>

Re: Lots of Deleted Rows Came back after upgrade 1.1.6 to 1.1.10

Reply via email to