Re: [DISCUSS] Experimental flagging (fork from Re-evaluate compaction defaults in 5.1/trunk)

Benedict Thu, 12 Dec 2024 06:57:40 -0800

I think alpha is fine. It communicates fairly well that there’s no near term expectation they will be production capable.

There is (I think) still an intention to improve them, but they are janky. If we don’t intend to begin fixing the feature within the next year or so we should deprecate it entirely.

On 12 Dec 2024, at 14:46, Aleksey Yeshchenko <[email protected]> wrote:

But MVs are not alpha or preview, as they are not actively being worked on. They are currently broken. Calling them ‘alpha’ makes ‘alpha’ overloaded and less useful.

On 12 Dec 2024, at 14:00, Josh McKenzie <[email protected]> wrote:

But we also need an approved non-euphemism for features like MVs (I suggest ‘broken’) and possibly a softer version of it ('dangerous') for our existing features that work fine in some narrow well-defined circumstances but will blow in your face if you don’t know exactly what you are doing.
Feels like the real answer is:
Endeavor to never get ourselves into this state
Take immediate action if we discover we're there (fix feature if possible, deprecate and remove if not). Not "leave to fester for years"
I like the introduction of 'alpha' as an alias for 'Preview'; not sure why that wasn't what we immediately came up with collectively given how widespread its usage is. :)

What would demoting MV's to 'alpha' right now look like? We'd warn on their usage w/some different structure and verbiage, and it'd be pretty implicitly clear to people they shouldn't use it in production right?

It seems to me that the 3 categories would be sufficient even to handle our current scenario where we have some things in the system that are a Bad Idea to use in production.

On Thu, Dec 12, 2024, at 6:06 AM, Aleksey Yeshchenko wrote:
I don’t like ‘unstable’ either, albeit for a different reason, but I don’t think three is enough and fits, as we already have some features that don’t fit into either of (preview,beta,ga) - released but broken, released but dangerous, deprecated, removed.

For new features going forward, alpha (preview) -> beta -> GA works well enough.

But we also need an approved non-euphemism for features like MVs (I suggest ‘broken’) and possibly a softer version of it ('dangerous') for our existing features that work fine in some narrow well-defined circumstances but will blow in your face if you don’t know exactly what you are doing.

These classifications are largely orthogonal.

Alpha(preview)->Beta->GA communicates readiness of a feature under development, with GA being the default final state for most features.

From there a feature can transition into ‘broken’ or ‘dangerous’ territory. Serious issues get uncovered (very) late sometimes. It is what it is.
And we do deprecate and remove functionality when it’s superseded.

-1 on unstable. It's way too many words than are needed. Three is a
magic number and fits:

Preview
Beta
GA

On 11 Dec 2024, at 18:50, Josh McKenzie <[email protected]> wrote:

A structured, disciplined approach to graduating something from [Optional] -> [Default] makes sense to me, similar to how we're talking about a structured flow of [Preview] -> [Beta] -> [GA]. Having those clear stages gives us a framework to define what requirements of stage transitions would be which'll ideally lead to us producing higher quality, more predictable, more consistent results for our end users.

For instance, requirements from [Optional] -> [Default] could be higher level abstractions like:
Confidence in stability
Strong evidence to indicate superiority in majority of workloads (by count or importance or size, etc)
These are all things we kind of do implicitly and ad-hoc on the mailing list, and I'm not looking to tie us down to any granular structure or specificity. More thinking it could be useful for someone that's worked on something who wonders "Huh. How do I take this from being optional to the default?" and having an answer better than "reinvent the wheel every time and fling spaghetti at the dev list and pray".

:)

On Wed, Dec 11, 2024, at 1:04 PM, Paulo Motta wrote:
Thanks for bringing up this topic, Josh.

Outside of the major features (ie. MV/SAI/TCM/Accord), one related discussion in this topic is: how can we "promote" small improvements in existing features from optional to default ?

It makes sense to have optimizations launched behind a feature flag initially (beta phase) while the improvement gets real world exposure, but I think we need a better way to promote these optimizations to default behavior on a regular cadence.

Take for example optimized repairs from CASSANDRA-16274. It was launched in 4.x as an optional feature gated behind a flag, ie. auto_optimise_full_repair_streams: false.

I could be easily missing something, but is there a world where non-optimized repairs make sense once this optimization is proven to work ? I agree this is fine while the feature is maturing, but at some point we need to rip the bandaid and make the optimization default (and clearly communicate that). This would allow cleanup code toil of default behavior that is no longer being used, because everyone is enabling the improvement during deployment.

This is just one example to demonstrate the issue and I don't want this discussion to focus on this particular case, but I can think of other improvements launched as optional that are never made default.

I don't know if this should be continued to be addressed on a improvement-by-improvement basis or if we could have a more streamlined process to review and communicate these changes more consciously at every major release.

In the same way we open a loop when adding an optimized behavior behind a feature flag, I think we should have a process to close these loops by promoting these optimizations to default when it makes sense.

On Tue, Dec 10, 2024 at 2:10 PM Josh McKenzie <[email protected]> wrote:

So some questions to test a world w/3 classifications (Preview, Beta, GA):
- What would we do with the current experimental features (MV's, JDK17, witnesses, etc)? Flag them as preview or beta as appropriate on a case-by-case basis and add runtime warnings / documentation where missing?

- What would we do in the future if a feature's GA and we discover a Very Big Problem with it that'll take some time to fix? Keep it GA but cut a hotfix release w/a bunch of warnings? Bounce it back to Preview? Leave it be and just feverishly try and fix it?

for policy decisions like this (that don’t need to be agreed in advance) we should try to legislate the minimum necessary policy to proceed today
Definitely agree; MV's being in limbo for years strains the "3-step classification" structure for me. If we want to avoid having a solution for the MV-shaped case on the grounds we won't allow ourselves to reach this state again in the future, that seems reasonable. With the caveat that we might be in a similar situation with vector search right now, etc.

On Tue, Dec 10, 2024, at 1:48 PM, Benedict Elliott Smith wrote:
Yep, I agree with this - we can revisit if we ever absolutely feel the need to add additional states for exceptional circumstances.

> On 10 Dec 2024, at 13:24, Patrick McFadin <[email protected]> wrote:
>
> -1 on unstable. It's way too many words than are needed. Three is a
> magic number and fits:
>
> Preview
> Beta
> GA
>
> As a matter of testing the process, any pending CEP should go though
> this exercise so we can see how it will work.
>
> PS
> Got the actual numbers from Whimsy.
> DEV - 1425 users
> USER - 2650
>
> This means that when features experience a state change, finding more
> avenues to get the word out will be important.
>
> On Tue, Dec 10, 2024 at 10:04 AM Benedict Elliott Smith
> <[email protected]> wrote:
>>
>> As an aside, it would be nice to admit we basically revisit everything each time it becomes relevant again, and for policy decisions like this (that don’t need to be agreed in advance) we should try to legislate the minimum necessary policy to proceed today, and leave future refinements for later when the relevant context arises.
>>
>> On 10 Dec 2024, at 13:00, Benedict Elliott Smith <[email protected]> wrote:
>>
>> I agree with Aleksey that if we think something is broken, we shouldn’t use euphemisms, and for this reason I don’t like unstable (this could for instance simply mean API unstable). If we intend to never need this descriptor, we should avoid bike-shedding and insert a “placeholder” for now to be refined as and when we need it when we have the necessary future context.
>>
>> i.e.
>>
>> preview -> beta -> [“has problems that will take time to resolve placeholder” -> beta] -> GA
>>
>>
>>
>> On 10 Dec 2024, at 12:39, Josh McKenzie <[email protected]> wrote:
>>
>> +1 to this classification with one addition. I think we need to augment this with formalization on what we do with features we don't recommend people use (i.e. MV in their current incarnation). For something retroactively found to be unstable, we could add an "Unstable" qualification for it, leaving us with:
>>
>> Unstable: Warnings on use, clearly communicated as to why, either on-track to be fixed or removed from the codebase. No lingering for years in a fugue state. We should target never needing this classification.
>> Preview: Ready to be tried by end users but has caveats and most likely is not api stable. Developer only documentation acceptable.
>> Beta: Feature complete/API stable but has not had enough testing to be considered rock solid. Developer and User documentation required.
>> GA: Ready for use, no known issue, PMC is satisfied with the testing that has been done
>>
>>
>> To walk through how some of the flow might look to test the above:
>>
>> Simple case:
>> - Preview -> Beta -> GA
>>
>> Late discovered defect case:
>> - Preview -> Beta -> Unstable -> Beta -> GA
>>
>> Pathological worst-case (i.e. MV):
>> - Preview -> Beta -> GA -> Unstable -> [Preview|Removed]
>>
>> On Tue, Dec 10, 2024, at 12:29 PM, Jeremiah Jordan wrote:
>>
>> I agree with Aleksey and Patrick. We should define terminology and then stick to it. My preferred list would be:
>>
>> Preview - Ready to be tried by end users but has caveats and most likely is not api stable.
>> Beta - Feature complete/API stable but has not had enough testing to be considered rock solid.
>> GA - Ready for use, no known issue, PMC is satisfied with the testing that has been done
>>
>>
>> Whether or not something is enabled by default or the default implementation is a separate access from the readiness. Though if we are replacing an existing thing with a new default I would hope we apply extra rigor to allowing that to happen.
>>
>> -Jeremiah
>>
>> On Dec 10, 2024 at 11:15:37 AM, Patrick McFadin <[email protected]> wrote:
>>
>> I'm going to try to pull this back from the inevitable bikeshedding
>> and airing of grievances that happen. Rewind all the way back to
>> Josh's original point, which is a defined process. Why I really love
>> this being brought up is our maturing process of communicating to the
>> larger user base. The dev list has very few participants. Less than
>> 1000 last I looked. Most users I talk to just want to know what they
>> are getting. Well-formed, clear communication is how the PMC can let
>> end users know that a new feature is one of three states:
>>
>> 1. Beta
>> 2. Generally Available
>> 3. Default (where appropriate)
>>
>> Yes! The work is just sorting out what each level means and then
>> codifying that in confluence. Then, we look at any features that are
>> under question, assign a level, and determine what it takes to go from
>> one state to another.
>>
>> The CEPs need to reflect this change. What makes a Beta, GA, Default
>> for new feature X. It makes it clear for implementers and end users,
>> which is an important feature of project maturity.
>>
>> Patrick
>>
>>
>>
>> On Dec 10, 2024 at 5:46:38 AM, Aleksey Yeshchenko <[email protected]> wrote:
>>
>> What we’ve done is we’ve overloaded the term ‘experimental’ to mean too many related but different ideas. We need additional, more specific terminology to disambiguate.
>>
>> 1. Labelling released features that were known to be unstable at release as ‘experimental’ retroactively shouldn’t happen and AFAIK only happened once, with MVs, and ‘experimental’ there was just a euphemism for ‘broken’. Our practices are more mature now, I like to think, that a situation like this would not arise in the future - the bar for releasing a completed marketable feature is higher. So the label ‘experimental’ should not be applied retroactively to anything.
>>
>> 2. It’s possible that a released, once considered production-ready feature, might be discovered to be deeply flawed after being released already. We need to temporarily mark such a feature as ‘broken' or ‘flawed'. Not experimental, and not even ‘unstable’. Make sure we emit a warning on its use everywhere, and, if possible, make it opt-in in the next major, at the very least, to prevent new uses of it. Announce on dev, add a note in NEWS.txt, etc. If the flaws are later addressed, remove the label. Removing the feature itself might not be possible, but should be considered, with heavy advanced telegraphing to the community.
>>
>> 3. There is probably room for genuine use of ‘experimental’ as a feature label. For opt-in features that we commit with an understanding that they might not make it at all. Unstable API is implied here, but a feature can also have an unstable API without being experimental - so ‘experimental' doesn’t equal to ‘api-unstable’. These should not be relied on by any production code, they would be heavily gated by unambiguous configuration flags, disabled by default, allowed to be removed or changed in any version including a minor one.
>>
>> 4. New features without known flaws, intended to be production-ready and marketable eventually, that we may want to gain some real-world confidence with before we are happy to market or make default. UCS, for example, which seems to be in heavy use in Astra and doesn’t have any known open issues (AFAIK). It’s not experimental, it’s not unstable, it’s not ‘alpha’ or ‘beta’, it just hasn't been widely enough used to have gained a lot of confidence. It’s just new. I’m not sure what label even applies here. It’s just a regular feature that happens to be new, doesn’t need a label, just needs to see some widespread use before we can make it a default. No other limitation on its use.
>>
>> 5. Early-integrated, not-yet fully-completed features that are NOT experimental in nature. Isolated, gated behind deep configuration flags. Have a CEP behind them, we trust that they will be eventually completed, but for pragmatic reasons it just made sense to commit them at an earlier stage. ‘Preview’, ‘alpha’, ‘beta’ are labels that could apply here depending on current feature readiness status. API-instability is implied. Once finished they just become a regular new feature, no flag needed, no heavy config gating needed.
>>
>> I might be missing some scenarios here.
>>
>>
>>

Re: [DISCUSS] Experimental flagging (fork from Re-evaluate compaction defaults in 5.1/trunk)

Reply via email to