Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)

Ekaterina Dimitrova Fri, 16 Feb 2024 07:33:15 -0800

Thanks for opening an epic @Jacek.

It seems the dtest_offheap job is replaced by dtest_latest which means we
will have the same amount of jobs after the current ticket and I am not
worried about Jenkins.


Though in CircleCI we did not have the dtest_offheap job mandatory run
pre-commit but as far as I can see this ticket suggests dtest_latest to be
mandatory run in the pre-commit workflow.
I would like to suggest we commit the current proposal. Only, I think the
config should be mentioned experimental somewhere.

As a short term solution to the raised consumption pre-commit tests run I
would like to suggest we accept running only the J11 pre-commit workflow
(which covers also tests run with J17) until we surface the other
discussion and we apply other test configuration changes/optimizations.

On Fri, 16 Feb 2024 at 9:08, Paulo Motta <pa...@apache.org> wrote:

> Thanks for clarifying Branimir! I'm +1 on proceeding as proposed and I
> think this change will make it easier to gain confidence to update
> configurations.
>
> Interesting discussion and suggestions on this thread - I think we can
> follow-up on improving test/CI workflow in a different thread/proposal to
> avoid blocking this.
>
> On Thu, Feb 15, 2024 at 9:59 AM Branimir Lambov <
> branimir.lam...@datastax.com> wrote:
>
>> Paulo:
>>
>>> 1) Will cassandra.yaml remain the default test config? Is the plan
>>> moving forward to require green CI for both configurations on pre-commit,
>>> or pre-release?
>>
>> The plan is to ensure both configurations are green pre-commit. This
>> should not increase the CI cost as this replaces extra configurations we
>> were running before (e.g. test-tries).
>>
>> 2) What will this mean for the release artifact, is the idea to continue
>>> shipping with the current cassandra.yaml or eventually switch to the
>>> optimized configuration (ie. 6.X) while making the legacy default
>>> configuration available via an optional flag?
>>
>> The release simply includes an additional yaml file, which contains a
>> one-liner how to use it.
>>
>> Jeff:
>>
>>> 1) If there’s an “old compatible default” and “latest recommended
>>> settings”, when does the value in “old compatible default” get updated?
>>> Never?
>>
>> This does not change anything about these decisions. The question is very
>> serious without this patch as well: Does V6 have to support pain-free
>> upgrade from V5 working in V4 compatible mode? If so, can we ever deprecate
>> or drop anything? If not, are we not breaking upgradeability promises?
>>
>> 2) If there are test failures with the new values, it seems REALLY
>>> IMPORTANT to make sure those test failures are discovered + fixed IN THE
>>> FUTURE TOO. If pushing new yaml into a different file makes us less likely
>>> to catch the failures in the future, it seems like we’re hurting ourselves.
>>> Branimir mentions this, but how do we ensure that we don’t let this pattern
>>> disguise future bugs?
>>
>> The main objective of this patch is to ensure that the second yaml is
>> tested too, pre-commit. We were not doing this for all features we tell
>> users are supported.
>>
>> Paulo:
>>
>>> - if cassandra_latest.yaml becomes the new default configuration for
>>> 6.0, then precommit only needs to be run against thatversion - prerelease
>>> needs to be run against all cassandra.yaml variants.
>>
>> Assuming we keep the pace of development, there will be new "latest"
>> features in 6.0 (e.g. Accord could be one). The idea is more to move some
>> of the settings from latest to default when they are deemed mature enough.
>>
>> Josh:
>>
>>> I propose to significantly reduce that stuff. Let's distinguish the
>>> packages of tests that need to be run with CDC enabled / disabled, with
>>> commitlog compression enabled / disabled, tests that verify sstable formats
>>> (mostly io and index I guess), and leave other parameters set as with the
>>> latest configuration - this is the easiest way I think.
>>> For dtests we have vnodes/no-vnodes, offheap/onheap, and nothing about
>>> other stuff. To me running no-vnodes makes no sense because no-vnodes is
>>> just a special case of vnodes=1. On the other hand offheap/onheap buffers
>>> could be tested in unit tests. In short, I'd run dtests only with the
>>> default and latest configuration.
>>
>> Some of these changes are already done in this ticket.
>>
>> Regards,
>> Branimir
>>
>>
>>
>> On Thu, Feb 15, 2024 at 3:08 PM Paulo Motta <pa...@apache.org> wrote:
>>
>>> > It's also been questioned about why we don't just enable settings we
>>> recommend.  These are settings we recommend for new clusters.  *Our
>>> existing cassandra.yaml needs to be tailored for existing clusters being
>>> upgraded, where we are very conservative about changing defaults.*
>>>
>>> I think this unnecessarily penalizes new users with subpar defaults and
>>> existing users who wish to use optimized/recommended defaults and need to
>>> maintain additional logic to support that. This change offers an
>>> opportunity to revisit this.
>>>
>>> Is not updating the default cassandra.yaml with new recommended
>>> configuration just to protect existing clusters from accidentally
>>> overriding cassandra.yaml with a new version during major upgrades? If so,
>>> perhaps we could add a new explicit flag “enable_major_upgrade: false” to
>>> “cassandra.yaml” that fails startup if an upgrade is detected and force
>>> operators to review the configuration before a major upgrade?
>>>
>>> Related to Jeff’s question, I think we need a way to consolidate “latest
>>> recommended settings” into “old compatible default” when cutting a new
>>> major version, otherwise the files will diverge perpetually.
>>>
>>> I think cassandra_latest.yaml offers a way to “buffer” proposals for
>>> default configuration changes which are consolidated into “cassandra.yaml”
>>> in the subsequent major release, eventually converging configurations and
>>> reducing the maintenance burden.
>>>
>>> On Thu, 15 Feb 2024 at 04:24 Mick Semb Wever <m...@apache.org> wrote:
>>>
>>>>
>>>>
>>>>> Mick and Ekaterina (and everyone really) - any thoughts on what test
>>>>> coverage, if any, we should commit to for this new configuration?
>>>>> Acknowledging that we already have *a lot* of CI that we run.
>>>>>
>>>>
>>>>
>>>>
>>>> Branimir in this patch has already done some basic cleanup of test
>>>> variations, so this is not a duplication of the pipeline.  It's a
>>>> significant improvement.
>>>>
>>>> I'm ok with cassandra_latest being committed and added to the pipeline,
>>>> *if* the authors genuinely believe there's significant time and effort
>>>> saved in doing so.
>>>>
>>>> How many broken tests are we talking about ?
>>>> Are they consistently broken or flaky ?
>>>> Are they ticketed up and 5.0-rc blockers ?
>>>>
>>>> Having to deal with flakies and broken tests is an unfortunate reality
>>>> to having a pipeline of 170k tests.
>>>>
>>>> Despite real frustrations I don't believe the broken windows analogy
>>>> is appropriate here – it's more of a leave the campground cleaner…   That
>>>> being said, knowingly introducing a few broken tests is not that either,
>>>> but still having to deal with a handful of consistently breaking tests
>>>> for a short period of time is not the same cognitive burden as flakies.
>>>> There are currently other broken tests in 5.0: VectorUpdateDeleteTest,
>>>> upgrade_through_versions_test; are these compounding to the frustrations ?
>>>>
>>>> It's also been questioned about why we don't just enable settings we
>>>> recommend.  These are settings we recommend for new clusters.  Our existing
>>>> cassandra.yaml needs to be tailored for existing clusters being upgraded,
>>>> where we are very conservative about changing defaults.
>>>>
>>>>
>>
>> --
>> Branimir Lambov
>> e. branimir.lam...@datastax.com
>> w. www.datastax.com
>>
>>

Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)

Reply via email to