Re: [DISCUSS] Experimental flagging (fork from Re-evaluate compaction defaults in 5.1/trunk)

Josh McKenzie Wed, 11 Dec 2024 10:51:10 -0800

A structured, disciplined approach to graduating something from [Optional] -> 
[Default] makes sense to me, similar to how we're talking about a structured 
flow of [Preview] -> [Beta] -> [GA]. Having those clear stages gives us a 
framework to define what requirements of stage transitions would be which'll 
ideally lead to us producing higher quality, more predictable, more consistent 
results for our end users.


For instance, requirements from [Optional] -> [Default] could be higher level 
abstractions like:
 • Confidence in stability
 • Strong evidence to indicate superiority in majority of workloads (by count 
or importance or size, etc)
These are all things we kind of do implicitly and ad-hoc on the mailing list, 
and I'm not looking to tie us down to any granular structure or specificity. 
More thinking it could be useful for someone that's worked on something who 
wonders "Huh. How do I take this from being optional to the default?" and 
having an answer better than "reinvent the wheel every time and fling spaghetti 
at the dev list and pray".

:)


On Wed, Dec 11, 2024, at 1:04 PM, Paulo Motta wrote:
> Thanks for bringing up this topic, Josh. 
> 
> Outside of the major features (ie. MV/SAI/TCM/Accord), one related discussion 
> in this topic is: how can we "promote" small improvements in existing 
> features from optional to default ?
> 
> It makes sense to have optimizations launched behind a feature flag initially 
> (beta phase) while the improvement gets real world exposure, but I think we 
> need a better way to promote these optimizations to default behavior on a 
> regular cadence.
> 
> Take for example optimized repairs from CASSANDRA-16274. It was launched in 
> 4.x as an optional feature gated behind a flag, ie. 
> auto_optimise_full_repair_streams: false. 
> 
> I could be easily missing something, but is there a world where non-optimized 
> repairs make sense once this optimization is proven to work ? I agree this is 
> fine while the feature is maturing, but at some point we need to rip the 
> bandaid and make the optimization default (and clearly communicate that). 
> This would allow cleanup code toil of default behavior that is no longer 
> being used, because everyone is enabling the improvement during deployment.
> 
> This is just one example to demonstrate the issue and I don't want this 
> discussion to focus on this particular case, but I can think of other 
> improvements launched as optional that are never made default.
> 
> I don't know if this should be continued to be addressed on a 
> improvement-by-improvement basis or if we could have a more streamlined 
> process to review and communicate these changes more consciously at every 
> major release.
> 
> In the same way we open a loop when adding an optimized behavior behind a 
> feature flag, I think we should have a process to close these loops by 
> promoting these optimizations to default when it makes sense.
> 
> On Tue, Dec 10, 2024 at 2:10 PM Josh McKenzie <[email protected]> wrote:
>> __
>> So some questions to test a world w/3 classifications (Preview, Beta, GA):
>> - What would we do with the current experimental features (MV's, JDK17, 
>> witnesses, etc)? Flag them as preview or beta as appropriate on a 
>> case-by-case basis and add runtime warnings / documentation where missing?
>> 
>> - What would we do in the future if a feature's GA and we discover a Very 
>> Big Problem with it that'll take some time to fix? Keep it GA but cut a 
>> hotfix release w/a bunch of warnings? Bounce it back to Preview? Leave it be 
>> and just feverishly try and fix it?
>> 
>>> for policy decisions like this (that don’t need to be agreed in advance) we 
>>> should try to legislate the minimum necessary policy to proceed today
>> Definitely agree; MV's being in limbo for years strains the "3-step 
>> classification" structure for me. If we want to avoid having a solution for 
>> the MV-shaped case on the grounds we won't allow ourselves to reach this 
>> state again in the future, that seems reasonable. With the caveat that we 
>> *might* be in a similar situation with vector search right now, etc.
>> 
>> 
>> On Tue, Dec 10, 2024, at 1:48 PM, Benedict Elliott Smith wrote:
>>> Yep, I agree with this - we can revisit if we ever absolutely feel the need 
>>> to add additional states for exceptional circumstances.
>>> 
>>> > On 10 Dec 2024, at 13:24, Patrick McFadin <[email protected]> wrote:
>>> > 
>>> > -1 on unstable. It's way too many words than are needed. Three is a
>>> > magic number and fits:
>>> > 
>>> > Preview
>>> > Beta
>>> > GA
>>> > 
>>> > As a matter of testing the process, any pending CEP should go though
>>> > this exercise so we can see how it will work.
>>> > 
>>> > PS
>>> > Got the actual numbers from Whimsy.
>>> > DEV - 1425 users
>>> > USER - 2650
>>> > 
>>> > This means that when features experience a state change, finding more
>>> > avenues to get the word out will be important.
>>> > 
>>> > On Tue, Dec 10, 2024 at 10:04 AM Benedict Elliott Smith
>>> > <[email protected]> wrote:
>>> >> 
>>> >> As an aside, it would be nice to admit we basically revisit everything 
>>> >> each time it becomes relevant again, and for policy decisions like this 
>>> >> (that don’t need to be agreed in advance) we should try to legislate the 
>>> >> minimum necessary policy to proceed today, and leave future refinements 
>>> >> for later when the relevant context arises.
>>> >> 
>>> >> On 10 Dec 2024, at 13:00, Benedict Elliott Smith <[email protected]> 
>>> >> wrote:
>>> >> 
>>> >> I agree with Aleksey that if we think something is broken, we shouldn’t 
>>> >> use euphemisms, and for this reason I don’t like unstable (this could 
>>> >> for instance simply mean API unstable). If we intend to never need this 
>>> >> descriptor, we should avoid bike-shedding and insert a “placeholder” for 
>>> >> now to be refined as and when we need it when we have the necessary 
>>> >> future context.
>>> >> 
>>> >> i.e.
>>> >> 
>>> >> preview -> beta -> [“has problems that will take time to resolve 
>>> >> placeholder” -> beta] -> GA
>>> >> 
>>> >> 
>>> >> 
>>> >> On 10 Dec 2024, at 12:39, Josh McKenzie <[email protected]> wrote:
>>> >> 
>>> >> +1 to this classification with one addition. I think we need to augment 
>>> >> this with formalization on what we do with features we don't recommend 
>>> >> people use (i.e. MV in their current incarnation). For something 
>>> >> retroactively found to be unstable, we could add an "Unstable" 
>>> >> qualification for it, leaving us with:
>>> >> 
>>> >> Unstable: Warnings on use, clearly communicated as to why, either 
>>> >> on-track to be fixed or removed from the codebase. No lingering for 
>>> >> years in a fugue state. We should target never needing this 
>>> >> classification.
>>> >> Preview: Ready to be tried by end users but has caveats and most likely 
>>> >> is not api stable. Developer only documentation acceptable.
>>> >> Beta: Feature complete/API stable but has not had enough testing to be 
>>> >> considered rock solid. Developer and User documentation required.
>>> >> GA: Ready for use, no known issue, PMC is satisfied with the testing 
>>> >> that has been done
>>> >> 
>>> >> 
>>> >> To walk through how some of the flow might look to test the above:
>>> >> 
>>> >> Simple case:
>>> >> - Preview -> Beta -> GA
>>> >> 
>>> >> Late discovered defect case:
>>> >> - Preview -> Beta -> Unstable -> Beta -> GA
>>> >> 
>>> >> Pathological worst-case (i.e. MV):
>>> >> - Preview -> Beta -> GA -> Unstable -> [Preview|Removed]
>>> >> 
>>> >> On Tue, Dec 10, 2024, at 12:29 PM, Jeremiah Jordan wrote:
>>> >> 
>>> >> I agree with Aleksey and Patrick.  We should define terminology and then 
>>> >> stick to it.  My preferred list would be:
>>> >> 
>>> >> Preview - Ready to be tried by end users but has caveats and most likely 
>>> >> is not api stable.
>>> >> Beta - Feature complete/API stable but has not had enough testing to be 
>>> >> considered rock solid.
>>> >> GA - Ready for use, no known issue, PMC is satisfied with the testing 
>>> >> that has been done
>>> >> 
>>> >> 
>>> >> Whether or not something is enabled by default or the default 
>>> >> implementation is a separate access from the readiness.  Though if we 
>>> >> are replacing an existing thing with a new default I would hope we apply 
>>> >> extra rigor to allowing that to happen.
>>> >> 
>>> >> -Jeremiah
>>> >> 
>>> >> On Dec 10, 2024 at 11:15:37 AM, Patrick McFadin <[email protected]> 
>>> >> wrote:
>>> >> 
>>> >> I'm going to try to pull this back from the inevitable bikeshedding
>>> >> and airing of grievances that happen. Rewind all the way back to
>>> >> Josh's  original point, which is a defined process. Why I really love
>>> >> this being brought up is our maturing process of communicating to the
>>> >> larger user base. The dev list has very few participants. Less than
>>> >> 1000 last I looked. Most users I talk to just want to know what they
>>> >> are getting. Well-formed, clear communication is how the PMC can let
>>> >> end users know that a new feature is one of three states:
>>> >> 
>>> >> 1. Beta
>>> >> 2. Generally Available
>>> >> 3. Default (where appropriate)
>>> >> 
>>> >> Yes! The work is just sorting out what each level means and then
>>> >> codifying that in confluence. Then, we look at any features that are
>>> >> under question, assign a level, and determine what it takes to go from
>>> >> one state to another.
>>> >> 
>>> >> The CEPs need to reflect this change. What makes a Beta, GA, Default
>>> >> for new feature X. It makes it clear for implementers and end users,
>>> >> which is an important feature of project maturity.
>>> >> 
>>> >> Patrick
>>> >> 
>>> >> 
>>> >> 
>>> >> On Dec 10, 2024 at 5:46:38 AM, Aleksey Yeshchenko <[email protected]> 
>>> >> wrote:
>>> >> 
>>> >> What we’ve done is we’ve overloaded the term ‘experimental’ to mean too 
>>> >> many related but different ideas. We need additional, more specific 
>>> >> terminology to disambiguate.
>>> >> 
>>> >> 1. Labelling released features that were known to be unstable at release 
>>> >> as ‘experimental’  retroactively shouldn’t happen and AFAIK only 
>>> >> happened once, with MVs, and ‘experimental’ there was just a euphemism 
>>> >> for ‘broken’. Our practices are more mature now, I like to think, that a 
>>> >> situation like this would not arise in the future - the bar for 
>>> >> releasing a completed marketable feature is higher. So the label 
>>> >> ‘experimental’ should not be applied retroactively to anything.
>>> >> 
>>> >> 2. It’s possible that a released, once considered production-ready 
>>> >> feature, might be discovered to be deeply flawed after being released 
>>> >> already. We need to temporarily mark such a feature as ‘broken' or 
>>> >> ‘flawed'. Not experimental, and not even ‘unstable’. Make sure we emit a 
>>> >> warning on its use everywhere, and, if possible, make it opt-in in the 
>>> >> next major, at the very least, to prevent new uses of it. Announce on 
>>> >> dev, add a note in NEWS.txt, etc. If the flaws are later addressed, 
>>> >> remove the label. Removing the feature itself might not be possible, but 
>>> >> should be considered, with heavy advanced telegraphing to the community.
>>> >> 
>>> >> 3. There is probably room for genuine use of ‘experimental’ as a feature 
>>> >> label. For opt-in features that we commit with an understanding that 
>>> >> they might not make it at all. Unstable API is implied here, but a 
>>> >> feature can also have an unstable API without being experimental - so 
>>> >> ‘experimental' doesn’t equal to ‘api-unstable’. These should not be 
>>> >> relied on by any production code, they would be heavily gated by 
>>> >> unambiguous configuration flags, disabled by default, allowed to be 
>>> >> removed or changed in any version including a minor one.
>>> >> 
>>> >> 4. New features without known flaws, intended to be production-ready and 
>>> >> marketable eventually, that we may want to gain some real-world 
>>> >> confidence with before we are happy to market or make default. UCS, for 
>>> >> example, which seems to be in heavy use in Astra and doesn’t have any 
>>> >> known open issues (AFAIK). It’s not experimental, it’s not unstable, 
>>> >> it’s not ‘alpha’ or ‘beta’, it just hasn't been widely enough used to 
>>> >> have gained a lot of confidence. It’s just new. I’m not sure what label 
>>> >> even applies here. It’s just a regular feature that happens to be new, 
>>> >> doesn’t need a label, just needs to see some widespread use before we 
>>> >> can make it a default. No other limitation on its use.
>>> >> 
>>> >> 5. Early-integrated, not-yet fully-completed features that are NOT 
>>> >> experimental in nature. Isolated, gated behind deep configuration flags. 
>>> >> Have a CEP behind them, we trust that they will be eventually completed, 
>>> >> but for pragmatic reasons it just made sense to commit them at an 
>>> >> earlier stage. ‘Preview’, ‘alpha’, ‘beta’ are labels that could apply 
>>> >> here depending on current feature readiness status. API-instability is 
>>> >> implied. Once finished they just become a regular new feature, no flag 
>>> >> needed, no heavy config gating needed.
>>> >> 
>>> >> I might be missing some scenarios here.
>>> >> 
>>> >> 
>>> >> 
>>> 
>>> 
>>

Re: [DISCUSS] Experimental flagging (fork from Re-evaluate compaction defaults in 5.1/trunk)

Reply via email to