Re: Merging compaction improvements to 5.0

Ariel Weisberg Wed, 19 Feb 2025 13:33:03 -0800

Whoops, 5 months ago, not six months ago. Much more reasonable to be making 
this kind of fix.


On Wed, Feb 19, 2025, at 3:56 PM, Ariel Weisberg wrote:
> Hi,
> 
> This does not constitute a review, but I looked at both of them to convince 
> myself how they go about solving their respective problems is a good idea. I 
> am weakly +1. The risk reward is there, but 13 months since 5.0 was released 
> feels a little late to trying to improve node density instead of just saying 
> it is what it is.
> 
> The +1 stems from the fact that if you really operate this at scale and you 
> are committed to node density then it's less of a nice to have and more of a 
> "some cluster falls over periodically now".
> 
> Ariel
> 
> On Fri, Feb 14, 2025, at 5:23 PM, Jordan West wrote:
>> Thanks for the write up Mick. I think its is a great evaluation of 15452. A 
>> few notes below:
>> 
>> * CI links for 15452 might be burried and I may need to link the most recent 
>> run (I’ve been using CircleCI since it’s what I’m familiar with — happy to 
>> have runs on ASf hardware as well). 
>> 
>> * 15452 is configurable by YAML property as you mentioned. I’m traveling but 
>> I’m 99% sure I made it a hot property as well (personally I believe you 
>> shouldn’t have to restart a node to disable something like this, so if I 
>> forgot to make it hot I will before merge). I’m traveling without my laptop 
>> so I will double check next week (if someone has time to before then it’s 
>> much appreciated) 
>> 
>>  * I think “safe-ish” is a fair description of 15452. The new code is fairly 
>> isolated but it’s in a very critical path with several call paths leading to 
>> it. I think the testing we’ve done (both manual and automated) as well as 
>> finding and fixing several bugs with those tests is what gives me the 
>> confidence. And as you mentioned we have the fail safe of turning it off 
>> otherwise. 
>> 
>> Jordan 
>> 
>> On Fri, Feb 14, 2025 at 12:42 Mick Semb Wever <[email protected]> wrote:
>>> Solid write up Jon!
>>> 
>>> Hoping the committers and PMC members are keeping in mind this (very)
>>> recent thread: 
>>> https://lists.apache.org/thread/h38g6q9d8h1q92h6qzs5tqdxpn2vmnyy
>>> 
>>> This thread needs to also be about evaluating the risk these commits
>>> are to a patch version.
>>> I'm +1 and here's my thinking over it…
>>> 
>>> Are the performance benefits clear ?
>>> Yes (thank you Jon for doing a solid job at demonstrating just how
>>> awesome this will be).
>>> 
>>> Do we want this in 5.0 ?
>>> That's the dumb question :) of course.
>>> 
>>> Does it fix a bug or a regression, or is an operational improvement ?
>>> Kinda, Branimir did mention a performance regression when the chunk
>>> cache was disabled.
>>> 
>>> How are the patches with unforeseen risks ?
>>> These patches are medium sized and low-level, though isolated to (and
>>> creating a better isolation of) the compaction (scan) reader.  15452
>>> looks safe-ish.  I can't speak to 20092, it would be nice to hear
>>> Branimir and Sylvain chime in.  Are we confident that if any bugs do
>>> arise in user's 5.0 production deployments we will be quick to provide
>>> fixes and releases?  I was thinking it's worth putting in a system
>>> property that can disable the scan reader, so if anything happens
>>> folks can just restart the node with the system property to return to
>>> pre-patch behaviour, but that's already there! :)
>>> 
>>> How well tested have they been ?
>>> We have one (or a few) reports of it running in production, that's
>>> super positive.  Reports of lots of manual testing, super positive
>>> too.  But there's no CI results available for either patch.  (Even the
>>> one committed to trunk doesn't have pre-commit CI results available.)
>>> I think CI results attached to the ticket are a must – I'm working on
>>> adding them.
>

Re: Merging compaction improvements to 5.0

Reply via email to