> 1. Tune parallelism levels per job (David and Ekaterina have insight on this)
+1 to this! I drastically lower our parallelism as only python-dtest upgrade tests need many resources… What I do for JVM unit/jvm-dtest is the following def java_parallelism(src_dir, kind, num_file_in_worker, include = lambda a, b: True): d = os.path.join(src_dir, 'test', kind) num_files = 0 for root, dirs, files in os.walk(d): for f in files: if f.endswith('Test.java') and include(os.path.join(root, f), f): num_files += 1 return math.floor(num_files / num_file_in_worker) def fix_parallelism(args, contents): jobs = contents['jobs'] unit_parallelism = java_parallelism(args.src, 'unit', 20) jvm_dtest_parallelism = java_parallelism(args.src, 'distributed', 4, lambda full, name: 'upgrade' not in full) jvm_dtest_upgrade_parallelism = java_parallelism(args.src, 'distributed', 2, lambda full, name: 'upgrade' in full) TL;DR - I find all test files we are going to run, and based off a pre-defined variable that says “idea” number of files per worker, I then calculate how many workers we need. So unit tests are num_files / 20 ~= 35 workers. Can I be “smarter” by knowing which files have higher cost? Sure… but the “perfect” and the “average” are too similar that it wasn’t worth it... > 2. Rename jobs on circle to be more indicative of their function Have an example? I am not against, I just don’t know the problem you are referring to. > 3. Unify j8 and j11 workflow pairs into single Fine by me, but we need to keep in mind j17 is coming. Also, most developmental CI builds don’t really need to run cross every JDK, so we need some way to disable different JDKs… When I am testing out a patch I tend to run the following (my script): "circleci-enable.py --no-jdk11”; this will remove the JDK11 builds. I know I am going to run them pre-merge so I know its safe for me. > 5. Flag on generate.sh to allow auto-run on push I really hate that we don’t do this by default… I still to this day strongly feel you should opt-out of CI rather than opt-in… seen several commits get merged as they didn’t see a error in circle… because circle didn’t do any work…. Yes, I am fully aware that I am beating a dead horse… TL;DR +1 > 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE] > temporary circleci config" as the commit message +0 from me… I have seen people not realize you have to commit after typing “higher” (wrapper around my personal circleci-enable.py script to apply my defaults to the build) but not an issue I have… so I don’t mind if people want the tool to integrate with git… With all that said, I do feel there is more, and something I feel Ekaterina is probably dealing with with her JDK17 work… 1) resource_class used is not because its needed… in HIGHER file we default to xlarge but only python upgrade tests need that… reported in CASSANDRA-17600 2) our current patching allows MID/HIGHER to drift as changes need new patches else patching may do the wrong thing… reported in CASSANDRA-17600 3) CI is a combinatorial problem, we need to run all jobs for all JDKs, vnode on/of, cdc on/off, compression on/of, etc…. But this is currently controlled and fleshed out by humans who want to add new jobs.. we should move away from maintaining .circleci/config-2_1.yml and instead auto-generate it. Simple example of this problem is jdk11 support… we run a subset of tests on jdk11 and say its supported… will jdk17 have the same issue? Will it be even less tests? Why does the burden lie on everyone to “do the right thing” when all they want is a simple job? 4) why do we require people to install “circleci” command to contribute? If you rename .circleci/config-2_1.yml to .circleci/config.yml then CI will work just fine… we don’t need to call “circleci config process” every time we touch circle config…. Also, seems that w/e someone new to circle config (but not cassandra) touch it they always mutate LOW/MID/HIGH and not .circleci/config-2_1.yml… so I keep going back to fix .circleci/config-2_1.yml…. > On Oct 19, 2022, at 1:32 PM, Miklosovic, Stefan > <stefan.mikloso...@netapp.com> wrote: > > 1) would be nice to have. The first thing I do is that I change the > parallelism to 20. None of committed config.yaml's are appropriate for our > company CircleCI so I have to tweak this manually. I think we can not run > more that 25/30 containers in parallel, something like that. HIGHRES has 100 > and MIDRES has some jobs having parallelism equal to 50 or so so that is not > good either. I would be happy with simple way to modify default config.yaml > on parallelism. I use "sed" to change parallelism: 4 to parallelism: 20 and > leave parallelism: 1 where it does not make sense to increase it. However I > noticed that there is not "4" set everywhere, some jobs have it set to "1" so > I have to take extra care of these cases (I consider that to be a bug, I > think there are two or three, I do not remember). Once set, I have that > config in "git stash" so I just apply it every time I need it. > > 5) would be nice too. > 7) is nice but not crucial, it takes no time to commit that. > > ________________________________________ > From: Josh McKenzie <jmcken...@apache.org> > Sent: Wednesday, October 19, 2022 21:50 > To: dev > Subject: [DISCUSS] Potential circleci config and workflow changes > > NetApp Security WARNING: This is an external email. Do not click links or > open attachments unless you recognize the sender and know the content is safe. > > > > While working w/Andres on CASSANDRA-17939 a variety of things came up > regarding our circleci config and opportunities to improve it. Figured I'd > hit the list up here to see what people's thoughts are since many of us > intersect with these systems daily and having your workflow disrupted without > having a chance to provide input is bad. > > The ideas: > 1. Tune parallelism levels per job (David and Ekaterina have insight on this) > 2. Rename jobs on circle to be more indicative of their function > 3. Unify j8 and j11 workflow pairs into single (for 2 and 3 see: > https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595) > 4. Update documentation w/guidance on using circle, .circleci/generate.sh > examples, etc > 4a. How to commit: > https://cassandra.apache.org/_/development/how_to_commit.html > 4b. Testing: https://cassandra.apache.org/_/development/testing.html > 5. Flag on generate.sh to allow auto-run on push > 6. Clean up the -l, -m, -h flags (test and indicate -l feasibility for all > suites, default to -m, deprecate -h?) <- may not be a code-change issue and > instead be a documentation issue > 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE] > temporary circleci config" as the commit message > > Curious to see what folks think. > > ~Josh >