Re: CASSANDRA-8527

Aleksey Yeshchenko Fri, 05 Jan 2018 11:24:36 -0800

Counting the number of range tombstones - sure, that is reasonable.

Counting cells/rows shadowed by range tombstones toward the limit, as if they 
were regular tombstones - arguably not so much.


—
AY

On 5 January 2018 at 18:53:45, Jon Haddad (j...@jonhaddad.com) wrote:

I think it’s reasonable to count the number of range tombstones towards the 
total tombstone count / threshold.  

I agree the number of rows shadowed by the tombstones should be tracked 
separately, and we should probably think a bit more about how to add 
configuration / limits around this without burdening the user with a bunch of 
new flags they have to think about. I would prefer to avoid any more 
configuration settings as complex as back_pressure_strategy going forward.  

> On Jan 5, 2018, at 9:36 AM, Alexander Dejanovski <a...@thelastpickle.com> 
> wrote:  
>  
> Hi Aleksey,  
>  
> ok we'll split the work and only deal with row level tombstones in  
> CASSANDRA-8527 <https://issues.apache.org/jira/browse/CASSANDRA-8527> and  
> create a follow up ticket to work on the separate counts you're suggesting.  
> My understanding of what you say is that you would not include range  
> tombstones in the warn/fail threshold, but row level tombstones have  
> somewhat an impact that is similar to cell tombstones. They will be  
> retained in memory and will be sent to replicas.  
> If we don't count them in the thresholds (at least for warnings), people  
> will miss the fact that they may be reading a lot of tombstones.  
> Are you ok with including those row tombstones as part of the thresholds ?  
> This was the original intent for creating this ticket, which was a  
> follow-up to CASSANDRA-8477  
> <https://issues.apache.org/jira/browse/CASSANDRA-8477>.  
>  
> For the follow up ticket, we may want to move the discussion in JIRA once  
> we've create the ticket, but a merge listener seems like the right way to  
> detect rows shadowed by range tombstones. That would force to change the  
> UnfilteredRowIterator interface to include the tombstones/rows/cells stats  
> as this is what is returned from the lower levels of the read path.  
> Is there any easier way to achieve this that you can think of, as that  
> interface is used in many parts of the code ?  
>  
> On Wed, Dec 27, 2017 at 1:35 PM Aleksey Yeshchenko <alek...@apple.com>  
> wrote:  
>  
>> As Jeff says, the number of actual tombstones is no less relevant than the  
>> number of  
>> cells and rows shadowed by them. And the way row and cell tombstones  
>> affect a read  
>> query can be very different from the way a big range tombstone might: you  
>> potentially  
>> have to retain every (regular) tombstone in memory and have replicas ship  
>> them to  
>> a coordinator, while you can discard everything shadowed by a big RT and  
>> only serialize  
>> a tiny bit of the data between the replicas and the coordinator.  
>>  
>> So a mixed metric that just mixes up rows and cells shadowed by all three  
>> kinds of tombstones  
>> without any differentiation, while better than not having that visibility  
>> at all, is worse than having  
>> a detailed rundown. If possible, I’d love to see proper separate counts  
>> tracked: range, row, and single-cell tombstones encountered, and # of rows  
>> or cells obsoleted by each type.  
>>  
>> I know that this is a non-trivial change, however, but hey, it doesn’t  
>> have to be a trivial patch if it’s going into 4.0.  
>>  
>> In the meantime I think it’d be helpful to report that single count. But I  
>> don’t like the idea of redefining what  
>> tombstone_warn_threshold and tombstone_failure_threshold mean, even in a  
>> major release, as RTs are  
>> qualitatively different from other tombstones, and have a much lower  
>> impact per dead row.  
>>  
>> —  
>> AY  
>>  
>> On 22 December 2017 at 03:53:47, kurt greaves (k...@instaclustr.com)  
>> wrote:  
>>  
>> I think that's a good idea for 4.x, but not so for current branches. I  
>> think as long as we document it well for 4.0 upgrades it's not so much of a  
>> problem. Obviously there will be cases of queries failing that were  
>> previously succeeding but we can already use  
>> tombstone_failure|warn_threshold to tune around that already. I don't think  
>> we need another yaml setting to enable/disable counting deleted rows for  
>> these thresholds, especially because it's going into 4.0. It *might* be a  
>> good idea to bump the tombstone failure threshold default as a safety net  
>> though (as well as put something in the NEWS.txt).  
>>  
>> On 21 December 2017 at 20:11, Jon Haddad <j...@jonhaddad.com> wrote:  
>>  
>>> I had suggested to Alex we kick this discussion over to the ML because  
>> the  
>>> change will have a significant impact on the behavior of Cassandra when  
>>> doing reads with range tombstones that cover a lot of rows. The behavior  
>>> now is a little weird, a single tombstone could shadow hundreds of  
>>> thousands or even millions of rows, and the query would probably just  
>> time  
>>> out. Personally, I’m in favor of the change in behavior of this patch but  
>>> I wanted to get some more feedback before committing to it. Are there any  
>>> objections to what Alex described?  
>>>  
>>> Regarding Mockito, I’ve been meaning to bring this up for a while, and  
>> I’m  
>>> a solid +1 on introducing it to help with testing. In an ideal world we’d  
>>> have no singletons and could test everything in isolation, but  
>>> realistically that’s a multi year process and we just aren’t there.  
>>>  
>>>  
>>>> On Dec 19, 2017, at 11:07 PM, Alexander Dejanovski <  
>>> a...@thelastpickle.com> wrote:  
>>>>  
>>>> Hi folks,  
>>>>  
>>>> I'm working on CASSANDRA-8527  
>>>> <https://issues.apache.org/jira/browse/CASSANDRA-8527> and would need  
>> to  
>>>> discuss a few things.  
>>>>  
>>>> The ticket makes it visible in tracing and metrics that rows shadowed  
>> by  
>>>> range tombstones were scanned during reads.  
>>>> Currently, scanned range tombstones aren't reported anywhere which  
>> hides  
>>>> the cause of performance issues during reads when the users perform  
>>> primary  
>>>> key deletes.  
>>>> As such, they do not count in the warn and failure thresholds.  
>>>>  
>>>> While the number of live rows and tombstone cells is counted in the  
>>>> ReadCommand class, it is currently not possible to count the number of  
>>>> range tombstones there as they are merged with the rows they shadow  
>>> before  
>>>> reaching the class.  
>>>> Instead, we can count the number of deleted rows that were read , which  
>>>> already improves diagnosis and show that range tombstones were scanned  
>> :  
>>>>  
>>>> if (row.hasLiveData(ReadCommand.this.nowInSec(),  
>> enforceStrictLiveness))  
>>>> ++liveRows;  
>>>> else if (!row.primaryKeyLivenessInfo().isLive(ReadCommand.this.  
>>> nowInSec()))  
>>>> {  
>>>> // We want to detect primary key deletions only.  
>>>> // If all cells have expired they will count as tombstones.  
>>>> ++deletedRows;  
>>>> }  
>>>>  
>>>> Deleted rows would be part of the warning threshold so that we can spot  
>>> the  
>>>> range tombstone scans in the logs and tracing would look like this :  
>>>>  
>>>> WARN [ReadStage-2] 2017-12-18 18:22:31,352 ReadCommand.java:491 -  
>>>> Read 2086 live rows, 2440 deleted rows and 0 tombstone cells for  
>>>> query..  
>>>>  
>>>>  
>>>> Are there any caveats to that approach ?  
>>>> Should we include the number of deleted rows in the failure threshold  
>> or  
>>>> make it optional, knowing that it could make some queries fail while  
>> they  
>>>> were passing before ?  
>>>>  
>>>> On a side note, is it ok to bring in Mockito in order to make it easier  
>>>> writing tests ? I would like to use a Spy in order to write some tests  
>>>> without starting the whole stack.  
>>>>  
>>>> Thanks,  
>>>>  
>>>>  
>>>> --  
>>>> -----------------  
>>>> Alexander Dejanovski  
>>>> France  
>>>> @alexanderdeja  
>>>>  
>>>> Consultant  
>>>> Apache Cassandra Consulting  
>>>> http://www.thelastpickle.com  
>>>  
>>>  
>>> ---------------------------------------------------------------------  
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org  
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org  
>>>  
>>>  
>>  
>  
>  
> --  
> -----------------  
> Alexander Dejanovski  
> France  
> @alexanderdeja  
>  
> Consultant  
> Apache Cassandra Consulting  
> http://www.thelastpickle.com  


---------------------------------------------------------------------  
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org  
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: CASSANDRA-8527

Reply via email to