Thanks for clarifying Branimir! I'm +1 on proceeding as proposed and I think this change will make it easier to gain confidence to update configurations.
Interesting discussion and suggestions on this thread - I think we can follow-up on improving test/CI workflow in a different thread/proposal to avoid blocking this. On Thu, Feb 15, 2024 at 9:59 AM Branimir Lambov < branimir.lam...@datastax.com> wrote: > Paulo: > >> 1) Will cassandra.yaml remain the default test config? Is the plan moving >> forward to require green CI for both configurations on pre-commit, or >> pre-release? > > The plan is to ensure both configurations are green pre-commit. This > should not increase the CI cost as this replaces extra configurations we > were running before (e.g. test-tries). > > 2) What will this mean for the release artifact, is the idea to continue >> shipping with the current cassandra.yaml or eventually switch to the >> optimized configuration (ie. 6.X) while making the legacy default >> configuration available via an optional flag? > > The release simply includes an additional yaml file, which contains a > one-liner how to use it. > > Jeff: > >> 1) If there’s an “old compatible default” and “latest recommended >> settings”, when does the value in “old compatible default” get updated? >> Never? > > This does not change anything about these decisions. The question is very > serious without this patch as well: Does V6 have to support pain-free > upgrade from V5 working in V4 compatible mode? If so, can we ever deprecate > or drop anything? If not, are we not breaking upgradeability promises? > > 2) If there are test failures with the new values, it seems REALLY >> IMPORTANT to make sure those test failures are discovered + fixed IN THE >> FUTURE TOO. If pushing new yaml into a different file makes us less likely >> to catch the failures in the future, it seems like we’re hurting ourselves. >> Branimir mentions this, but how do we ensure that we don’t let this pattern >> disguise future bugs? > > The main objective of this patch is to ensure that the second yaml is > tested too, pre-commit. We were not doing this for all features we tell > users are supported. > > Paulo: > >> - if cassandra_latest.yaml becomes the new default configuration for 6.0, >> then precommit only needs to be run against thatversion - prerelease needs >> to be run against all cassandra.yaml variants. > > Assuming we keep the pace of development, there will be new "latest" > features in 6.0 (e.g. Accord could be one). The idea is more to move some > of the settings from latest to default when they are deemed mature enough. > > Josh: > >> I propose to significantly reduce that stuff. Let's distinguish the >> packages of tests that need to be run with CDC enabled / disabled, with >> commitlog compression enabled / disabled, tests that verify sstable formats >> (mostly io and index I guess), and leave other parameters set as with the >> latest configuration - this is the easiest way I think. >> For dtests we have vnodes/no-vnodes, offheap/onheap, and nothing about >> other stuff. To me running no-vnodes makes no sense because no-vnodes is >> just a special case of vnodes=1. On the other hand offheap/onheap buffers >> could be tested in unit tests. In short, I'd run dtests only with the >> default and latest configuration. > > Some of these changes are already done in this ticket. > > Regards, > Branimir > > > > On Thu, Feb 15, 2024 at 3:08 PM Paulo Motta <pa...@apache.org> wrote: > >> > It's also been questioned about why we don't just enable settings we >> recommend. These are settings we recommend for new clusters. *Our >> existing cassandra.yaml needs to be tailored for existing clusters being >> upgraded, where we are very conservative about changing defaults.* >> >> I think this unnecessarily penalizes new users with subpar defaults and >> existing users who wish to use optimized/recommended defaults and need to >> maintain additional logic to support that. This change offers an >> opportunity to revisit this. >> >> Is not updating the default cassandra.yaml with new recommended >> configuration just to protect existing clusters from accidentally >> overriding cassandra.yaml with a new version during major upgrades? If so, >> perhaps we could add a new explicit flag “enable_major_upgrade: false” to >> “cassandra.yaml” that fails startup if an upgrade is detected and force >> operators to review the configuration before a major upgrade? >> >> Related to Jeff’s question, I think we need a way to consolidate “latest >> recommended settings” into “old compatible default” when cutting a new >> major version, otherwise the files will diverge perpetually. >> >> I think cassandra_latest.yaml offers a way to “buffer” proposals for >> default configuration changes which are consolidated into “cassandra.yaml” >> in the subsequent major release, eventually converging configurations and >> reducing the maintenance burden. >> >> On Thu, 15 Feb 2024 at 04:24 Mick Semb Wever <m...@apache.org> wrote: >> >>> >>> >>>> Mick and Ekaterina (and everyone really) - any thoughts on what test >>>> coverage, if any, we should commit to for this new configuration? >>>> Acknowledging that we already have *a lot* of CI that we run. >>>> >>> >>> >>> >>> Branimir in this patch has already done some basic cleanup of test >>> variations, so this is not a duplication of the pipeline. It's a >>> significant improvement. >>> >>> I'm ok with cassandra_latest being committed and added to the pipeline, >>> *if* the authors genuinely believe there's significant time and effort >>> saved in doing so. >>> >>> How many broken tests are we talking about ? >>> Are they consistently broken or flaky ? >>> Are they ticketed up and 5.0-rc blockers ? >>> >>> Having to deal with flakies and broken tests is an unfortunate reality >>> to having a pipeline of 170k tests. >>> >>> Despite real frustrations I don't believe the broken windows analogy is >>> appropriate here – it's more of a leave the campground cleaner… That >>> being said, knowingly introducing a few broken tests is not that either, >>> but still having to deal with a handful of consistently breaking tests >>> for a short period of time is not the same cognitive burden as flakies. >>> There are currently other broken tests in 5.0: VectorUpdateDeleteTest, >>> upgrade_through_versions_test; are these compounding to the frustrations ? >>> >>> It's also been questioned about why we don't just enable settings we >>> recommend. These are settings we recommend for new clusters. Our existing >>> cassandra.yaml needs to be tailored for existing clusters being upgraded, >>> where we are very conservative about changing defaults. >>> >>> > > -- > Branimir Lambov > e. branimir.lam...@datastax.com > w. www.datastax.com > >