I think what Jordan is exploring, and I agree on, is that we need clear next steps to help reduce the 75% ish increase in dtest runtime. For sponsored contributors using circle to run the entire suites, throwing more money at the problem through parallelization isn't a long-term solution.
On Sat, Aug 22, 2020 at 8:40 AM Jeremy Hanna <jeremy.hanna1...@gmail.com> wrote: > I know the dtests take a long time and this will make them longer. As a > counter point most people run individual dtests locally and the full set on > dedicated test infrastructure. For the dedicated test infrastructure Mick > also improved the wall clock runtime when parallelizing the dtests on > https://issues.apache.org/jira/browse/CASSANDRA-16006. > > Even with the longer dtest full runtime, I firmly believe that for the > sake of new users and how hard it is to change num_tokens once data is > written, this change to the default of num_tokens is long overdue. Another > hidden benefit of this change is that the dtests will now run bootstraps > the way operators should run them in practice with the new defaults. That > will make the more common default case much more tested and hopefully catch > regressions in that execution path faster. > > So while it is not a trivial change in full dtest runtime, the benefits to > the community and project are also not trivial. I’m really grateful to all > who have put in effort to make this a reality and know that new users in > 4.0 will benefit from these improved defaults. > > In other words my non binding vote is to merge this and look to improve > execution time separately with that effort not being as urgent for the > reasons stated above. > > Jeremy > > > On Aug 20, 2020, at 2:49 AM, Mick Semb Wever <m...@apache.org> wrote: > > > > It was agreed¹ that 4.0 should have the new configuration defaults of > > num_tokens: 16 > > allocate_tokens_for_local_replication_factor: 3 > > > > 13701's patches: against cassandra, cassandra-builds, cassandra-dtest, > ccm; > > are reviewed, tested, and ready to commit. But the ccm and dtest patches > > required ccm having to now start nodes sequentially, and adding some > longer > > timeout values in the dtests. > > > > The consequence of this is CI runs now take longer. ci-cassandra.a.o's > > dtests take ~30% longer, and circleci's dtests (with vnodes) have gone > from > > ~22 to ~43 minutes. The general opinion (on slack²) is to commit, and > work > > on improving ccm and dtest startup times in a subsequent ticket. > > > > 13701 was intended to be committed before the first beta release because > of > > its user-facing changes. But these numbers are significant enough it > makes > > sense to touch base with dev@ > > > > Does anyone (strongly) object to the "commit + follow up ticket" > approach? > > > > regards, > > Mick > > > > > > ¹ – > > > https://lists.apache.org/thread.html/ra829084fcf344e9e96fa5c61cb31e909c8629091993471594b65ea89%40%3Cdev.cassandra.apache.org%3E > > ² – https://the-asf.slack.com/archives/CK23JSY2K/p1597747395032600 and > > > https://the-asf.slack.com/archives/CK23JSY2K/p1597849774078200?thread_ts=1597762085.048300&cid=CK23JSY2K >