Re: Merging compaction improvements to 5.0

Ariel Weisberg Wed, 19 Feb 2025 12:57:29 -0800

Hi,

This does not constitute a review, but I looked at both of them to convince 
myself how they go about solving their respective problems is a good idea. I am 
weakly +1. The risk reward is there, but 13 months since 5.0 was released feels 
a little late to trying to improve node density instead of just saying it is 
what it is.


The +1 stems from the fact that if you really operate this at scale and you are 
committed to node density then it's less of a nice to have and more of a "some 
cluster falls over periodically now".

Ariel

On Fri, Feb 14, 2025, at 5:23 PM, Jordan West wrote:
> Thanks for the write up Mick. I think its is a great evaluation of 15452. A 
> few notes below:
> 
> * CI links for 15452 might be burried and I may need to link the most recent 
> run (I’ve been using CircleCI since it’s what I’m familiar with — happy to 
> have runs on ASf hardware as well). 
> 
> * 15452 is configurable by YAML property as you mentioned. I’m traveling but 
> I’m 99% sure I made it a hot property as well (personally I believe you 
> shouldn’t have to restart a node to disable something like this, so if I 
> forgot to make it hot I will before merge). I’m traveling without my laptop 
> so I will double check next week (if someone has time to before then it’s 
> much appreciated) 
> 
>  * I think “safe-ish” is a fair description of 15452. The new code is fairly 
> isolated but it’s in a very critical path with several call paths leading to 
> it. I think the testing we’ve done (both manual and automated) as well as 
> finding and fixing several bugs with those tests is what gives me the 
> confidence. And as you mentioned we have the fail safe of turning it off 
> otherwise. 
> 
> Jordan 
> 
> On Fri, Feb 14, 2025 at 12:42 Mick Semb Wever <m...@apache.org> wrote:
>> Solid write up Jon!
>> 
>> Hoping the committers and PMC members are keeping in mind this (very)
>> recent thread: 
>> https://lists.apache.org/thread/h38g6q9d8h1q92h6qzs5tqdxpn2vmnyy
>> 
>> This thread needs to also be about evaluating the risk these commits
>> are to a patch version.
>> I'm +1 and here's my thinking over it…
>> 
>> Are the performance benefits clear ?
>> Yes (thank you Jon for doing a solid job at demonstrating just how
>> awesome this will be).
>> 
>> Do we want this in 5.0 ?
>> That's the dumb question :) of course.
>> 
>> Does it fix a bug or a regression, or is an operational improvement ?
>> Kinda, Branimir did mention a performance regression when the chunk
>> cache was disabled.
>> 
>> How are the patches with unforeseen risks ?
>> These patches are medium sized and low-level, though isolated to (and
>> creating a better isolation of) the compaction (scan) reader.  15452
>> looks safe-ish.  I can't speak to 20092, it would be nice to hear
>> Branimir and Sylvain chime in.  Are we confident that if any bugs do
>> arise in user's 5.0 production deployments we will be quick to provide
>> fixes and releases?  I was thinking it's worth putting in a system
>> property that can disable the scan reader, so if anything happens
>> folks can just restart the node with the system property to return to
>> pre-patch behaviour, but that's already there! :)
>> 
>> How well tested have they been ?
>> We have one (or a few) reports of it running in production, that's
>> super positive.  Reports of lots of manual testing, super positive
>> too.  But there's no CI results available for either patch.  (Even the
>> one committed to trunk doesn't have pre-commit CI results available.)
>> I think CI results attached to the ticket are a must – I'm working on
>> adding them.

Re: Merging compaction improvements to 5.0

Reply via email to