Whoops, 5 months ago, not six months ago. Much more reasonable to be making this kind of fix.
On Wed, Feb 19, 2025, at 3:56 PM, Ariel Weisberg wrote: > Hi, > > This does not constitute a review, but I looked at both of them to convince > myself how they go about solving their respective problems is a good idea. I > am weakly +1. The risk reward is there, but 13 months since 5.0 was released > feels a little late to trying to improve node density instead of just saying > it is what it is. > > The +1 stems from the fact that if you really operate this at scale and you > are committed to node density then it's less of a nice to have and more of a > "some cluster falls over periodically now". > > Ariel > > On Fri, Feb 14, 2025, at 5:23 PM, Jordan West wrote: >> Thanks for the write up Mick. I think its is a great evaluation of 15452. A >> few notes below: >> >> * CI links for 15452 might be burried and I may need to link the most recent >> run (I’ve been using CircleCI since it’s what I’m familiar with — happy to >> have runs on ASf hardware as well). >> >> * 15452 is configurable by YAML property as you mentioned. I’m traveling but >> I’m 99% sure I made it a hot property as well (personally I believe you >> shouldn’t have to restart a node to disable something like this, so if I >> forgot to make it hot I will before merge). I’m traveling without my laptop >> so I will double check next week (if someone has time to before then it’s >> much appreciated) >> >> * I think “safe-ish” is a fair description of 15452. The new code is fairly >> isolated but it’s in a very critical path with several call paths leading to >> it. I think the testing we’ve done (both manual and automated) as well as >> finding and fixing several bugs with those tests is what gives me the >> confidence. And as you mentioned we have the fail safe of turning it off >> otherwise. >> >> Jordan >> >> On Fri, Feb 14, 2025 at 12:42 Mick Semb Wever <m...@apache.org> wrote: >>> Solid write up Jon! >>> >>> Hoping the committers and PMC members are keeping in mind this (very) >>> recent thread: >>> https://lists.apache.org/thread/h38g6q9d8h1q92h6qzs5tqdxpn2vmnyy >>> >>> This thread needs to also be about evaluating the risk these commits >>> are to a patch version. >>> I'm +1 and here's my thinking over it… >>> >>> Are the performance benefits clear ? >>> Yes (thank you Jon for doing a solid job at demonstrating just how >>> awesome this will be). >>> >>> Do we want this in 5.0 ? >>> That's the dumb question :) of course. >>> >>> Does it fix a bug or a regression, or is an operational improvement ? >>> Kinda, Branimir did mention a performance regression when the chunk >>> cache was disabled. >>> >>> How are the patches with unforeseen risks ? >>> These patches are medium sized and low-level, though isolated to (and >>> creating a better isolation of) the compaction (scan) reader. 15452 >>> looks safe-ish. I can't speak to 20092, it would be nice to hear >>> Branimir and Sylvain chime in. Are we confident that if any bugs do >>> arise in user's 5.0 production deployments we will be quick to provide >>> fixes and releases? I was thinking it's worth putting in a system >>> property that can disable the scan reader, so if anything happens >>> folks can just restart the node with the system property to return to >>> pre-patch behaviour, but that's already there! :) >>> >>> How well tested have they been ? >>> We have one (or a few) reports of it running in production, that's >>> super positive. Reports of lots of manual testing, super positive >>> too. But there's no CI results available for either patch. (Even the >>> one committed to trunk doesn't have pre-commit CI results available.) >>> I think CI results attached to the ticket are a must – I'm working on >>> adding them. >