Thanks for opening an epic @Jacek. It seems the dtest_offheap job is replaced by dtest_latest which means we will have the same amount of jobs after the current ticket and I am not worried about Jenkins.
Though in CircleCI we did not have the dtest_offheap job mandatory run pre-commit but as far as I can see this ticket suggests dtest_latest to be mandatory run in the pre-commit workflow. I would like to suggest we commit the current proposal. Only, I think the config should be mentioned experimental somewhere. As a short term solution to the raised consumption pre-commit tests run I would like to suggest we accept running only the J11 pre-commit workflow (which covers also tests run with J17) until we surface the other discussion and we apply other test configuration changes/optimizations. On Fri, 16 Feb 2024 at 9:08, Paulo Motta <pa...@apache.org> wrote: > Thanks for clarifying Branimir! I'm +1 on proceeding as proposed and I > think this change will make it easier to gain confidence to update > configurations. > > Interesting discussion and suggestions on this thread - I think we can > follow-up on improving test/CI workflow in a different thread/proposal to > avoid blocking this. > > On Thu, Feb 15, 2024 at 9:59 AM Branimir Lambov < > branimir.lam...@datastax.com> wrote: > >> Paulo: >> >>> 1) Will cassandra.yaml remain the default test config? Is the plan >>> moving forward to require green CI for both configurations on pre-commit, >>> or pre-release? >> >> The plan is to ensure both configurations are green pre-commit. This >> should not increase the CI cost as this replaces extra configurations we >> were running before (e.g. test-tries). >> >> 2) What will this mean for the release artifact, is the idea to continue >>> shipping with the current cassandra.yaml or eventually switch to the >>> optimized configuration (ie. 6.X) while making the legacy default >>> configuration available via an optional flag? >> >> The release simply includes an additional yaml file, which contains a >> one-liner how to use it. >> >> Jeff: >> >>> 1) If there’s an “old compatible default” and “latest recommended >>> settings”, when does the value in “old compatible default” get updated? >>> Never? >> >> This does not change anything about these decisions. The question is very >> serious without this patch as well: Does V6 have to support pain-free >> upgrade from V5 working in V4 compatible mode? If so, can we ever deprecate >> or drop anything? If not, are we not breaking upgradeability promises? >> >> 2) If there are test failures with the new values, it seems REALLY >>> IMPORTANT to make sure those test failures are discovered + fixed IN THE >>> FUTURE TOO. If pushing new yaml into a different file makes us less likely >>> to catch the failures in the future, it seems like we’re hurting ourselves. >>> Branimir mentions this, but how do we ensure that we don’t let this pattern >>> disguise future bugs? >> >> The main objective of this patch is to ensure that the second yaml is >> tested too, pre-commit. We were not doing this for all features we tell >> users are supported. >> >> Paulo: >> >>> - if cassandra_latest.yaml becomes the new default configuration for >>> 6.0, then precommit only needs to be run against thatversion - prerelease >>> needs to be run against all cassandra.yaml variants. >> >> Assuming we keep the pace of development, there will be new "latest" >> features in 6.0 (e.g. Accord could be one). The idea is more to move some >> of the settings from latest to default when they are deemed mature enough. >> >> Josh: >> >>> I propose to significantly reduce that stuff. Let's distinguish the >>> packages of tests that need to be run with CDC enabled / disabled, with >>> commitlog compression enabled / disabled, tests that verify sstable formats >>> (mostly io and index I guess), and leave other parameters set as with the >>> latest configuration - this is the easiest way I think. >>> For dtests we have vnodes/no-vnodes, offheap/onheap, and nothing about >>> other stuff. To me running no-vnodes makes no sense because no-vnodes is >>> just a special case of vnodes=1. On the other hand offheap/onheap buffers >>> could be tested in unit tests. In short, I'd run dtests only with the >>> default and latest configuration. >> >> Some of these changes are already done in this ticket. >> >> Regards, >> Branimir >> >> >> >> On Thu, Feb 15, 2024 at 3:08 PM Paulo Motta <pa...@apache.org> wrote: >> >>> > It's also been questioned about why we don't just enable settings we >>> recommend. These are settings we recommend for new clusters. *Our >>> existing cassandra.yaml needs to be tailored for existing clusters being >>> upgraded, where we are very conservative about changing defaults.* >>> >>> I think this unnecessarily penalizes new users with subpar defaults and >>> existing users who wish to use optimized/recommended defaults and need to >>> maintain additional logic to support that. This change offers an >>> opportunity to revisit this. >>> >>> Is not updating the default cassandra.yaml with new recommended >>> configuration just to protect existing clusters from accidentally >>> overriding cassandra.yaml with a new version during major upgrades? If so, >>> perhaps we could add a new explicit flag “enable_major_upgrade: false” to >>> “cassandra.yaml” that fails startup if an upgrade is detected and force >>> operators to review the configuration before a major upgrade? >>> >>> Related to Jeff’s question, I think we need a way to consolidate “latest >>> recommended settings” into “old compatible default” when cutting a new >>> major version, otherwise the files will diverge perpetually. >>> >>> I think cassandra_latest.yaml offers a way to “buffer” proposals for >>> default configuration changes which are consolidated into “cassandra.yaml” >>> in the subsequent major release, eventually converging configurations and >>> reducing the maintenance burden. >>> >>> On Thu, 15 Feb 2024 at 04:24 Mick Semb Wever <m...@apache.org> wrote: >>> >>>> >>>> >>>>> Mick and Ekaterina (and everyone really) - any thoughts on what test >>>>> coverage, if any, we should commit to for this new configuration? >>>>> Acknowledging that we already have *a lot* of CI that we run. >>>>> >>>> >>>> >>>> >>>> Branimir in this patch has already done some basic cleanup of test >>>> variations, so this is not a duplication of the pipeline. It's a >>>> significant improvement. >>>> >>>> I'm ok with cassandra_latest being committed and added to the pipeline, >>>> *if* the authors genuinely believe there's significant time and effort >>>> saved in doing so. >>>> >>>> How many broken tests are we talking about ? >>>> Are they consistently broken or flaky ? >>>> Are they ticketed up and 5.0-rc blockers ? >>>> >>>> Having to deal with flakies and broken tests is an unfortunate reality >>>> to having a pipeline of 170k tests. >>>> >>>> Despite real frustrations I don't believe the broken windows analogy >>>> is appropriate here – it's more of a leave the campground cleaner… That >>>> being said, knowingly introducing a few broken tests is not that either, >>>> but still having to deal with a handful of consistently breaking tests >>>> for a short period of time is not the same cognitive burden as flakies. >>>> There are currently other broken tests in 5.0: VectorUpdateDeleteTest, >>>> upgrade_through_versions_test; are these compounding to the frustrations ? >>>> >>>> It's also been questioned about why we don't just enable settings we >>>> recommend. These are settings we recommend for new clusters. Our existing >>>> cassandra.yaml needs to be tailored for existing clusters being upgraded, >>>> where we are very conservative about changing defaults. >>>> >>>> >> >> -- >> Branimir Lambov >> e. branimir.lam...@datastax.com >> w. www.datastax.com >> >>