> 1. Tune parallelism levels per job (David and Ekaterina have insight on this)

+1 to this!  I drastically lower our parallelism as only python-dtest upgrade 
tests need many resources…

What I do for JVM unit/jvm-dtest is the following

def java_parallelism(src_dir, kind, num_file_in_worker, include = lambda a, b: 
True):
    d = os.path.join(src_dir, 'test', kind)
    num_files = 0
    for root, dirs, files in os.walk(d):
        for f in files:
            if f.endswith('Test.java') and include(os.path.join(root, f), f):
                num_files += 1
    return math.floor(num_files / num_file_in_worker)

def fix_parallelism(args, contents):
    jobs = contents['jobs']

    unit_parallelism                = java_parallelism(args.src, 'unit', 20)
    jvm_dtest_parallelism           = java_parallelism(args.src, 'distributed', 
4, lambda full, name: 'upgrade' not in full)
    jvm_dtest_upgrade_parallelism   = java_parallelism(args.src, 'distributed', 
2, lambda full, name: 'upgrade' in full)

TL;DR - I find all test files we are going to run, and based off a pre-defined 
variable that says “idea” number of files per worker, I then calculate how many 
workers we need.  So unit tests are num_files / 20 ~= 35 workers.  Can I be 
“smarter” by knowing which files have higher cost?  Sure… but the “perfect” and 
the “average” are too similar that it wasn’t worth it...

> 2. Rename jobs on circle to be more indicative of their function


Have an example?  I am not against, I just don’t know the problem you are 
referring to.

> 3. Unify j8 and j11 workflow pairs into single


Fine by me, but we need to keep in mind j17 is coming.  Also, most 
developmental CI builds don’t really need to run cross every JDK, so we need 
some way to disable different JDKs…

When I am testing out a patch I tend to run the following (my script): 
"circleci-enable.py --no-jdk11”; this will remove the JDK11 builds.  I know I 
am going to run them pre-merge so I know its safe for me.

> 5. Flag on generate.sh to allow auto-run on push


I really hate that we don’t do this by default… I still to this day strongly 
feel you should opt-out of CI rather than opt-in… seen several commits get 
merged as they didn’t see a error in circle… because circle didn’t do any 
work…. Yes, I am fully aware that I am beating a dead horse… 

TL;DR +1

> 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE] 
> temporary circleci config" as the commit message


+0 from me… I have seen people not realize you have to commit after typing 
“higher” (wrapper around my personal circleci-enable.py script to apply my 
defaults to the build) but not an issue I have… so I don’t mind if people want 
the tool to integrate with git…


With all that said, I do feel there is more, and something I feel Ekaterina is 
probably dealing with with her JDK17 work…

1) resource_class used is not because its needed… in HIGHER file we default to 
xlarge but only python upgrade tests need that… reported in CASSANDRA-17600
2) our current patching allows MID/HIGHER to drift as changes need new patches 
else patching may do the wrong thing… reported in CASSANDRA-17600
3) CI is a combinatorial problem, we need to run all jobs for all JDKs, vnode 
on/of, cdc on/off, compression on/of, etc…. But this is currently controlled 
and fleshed out by humans who want to add new jobs..  we should move away from 
maintaining .circleci/config-2_1.yml and instead auto-generate it.  Simple 
example of this problem is jdk11 support… we run a subset of tests on jdk11 and 
say its supported… will jdk17 have the same issue?  Will it be even less tests? 
 Why does the burden lie on everyone to “do the right thing” when all they want 
is a simple job?
4) why do we require people to install “circleci” command to contribute?  If 
you rename .circleci/config-2_1.yml to .circleci/config.yml then CI will work 
just fine… we don’t need to call “circleci config process” every time we touch 
circle config…. Also, seems that w/e someone new to circle config (but not 
cassandra) touch it they always mutate LOW/MID/HIGH and not 
.circleci/config-2_1.yml… so I keep going back to fix .circleci/config-2_1.yml….


> On Oct 19, 2022, at 1:32 PM, Miklosovic, Stefan 
> <stefan.mikloso...@netapp.com> wrote:
> 
> 1) would be nice to have. The first thing I do is that I change the 
> parallelism to 20. None of committed config.yaml's are appropriate for our 
> company CircleCI so I have to tweak this manually. I think we can not run 
> more that 25/30 containers in parallel, something like that. HIGHRES has 100 
> and MIDRES has some jobs having parallelism equal to 50 or so so that is not 
> good either. I would be happy with simple way to modify default config.yaml 
> on parallelism. I use "sed" to change parallelism: 4 to parallelism: 20 and 
> leave parallelism: 1 where it does not make sense to increase it. However I 
> noticed that there is not "4" set everywhere, some jobs have it set to "1" so 
> I have to take extra care of these cases (I consider that to be a bug, I 
> think there are two or three, I do not remember). Once set, I have that 
> config in "git stash" so I just apply it every time I need it.
> 
> 5) would be nice too.
> 7) is nice but not crucial, it takes no time to commit that.
> 
> ________________________________________
> From: Josh McKenzie <jmcken...@apache.org>
> Sent: Wednesday, October 19, 2022 21:50
> To: dev
> Subject: [DISCUSS] Potential circleci config and workflow changes
> 
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
> 
> 
> 
> While working w/Andres on CASSANDRA-17939 a variety of things came up 
> regarding our circleci config and opportunities to improve it. Figured I'd 
> hit the list up here to see what people's thoughts are since many of us 
> intersect with these systems daily and having your workflow disrupted without 
> having a chance to provide input is bad.
> 
> The ideas:
> 1. Tune parallelism levels per job (David and Ekaterina have insight on this)
> 2. Rename jobs on circle to be more indicative of their function
> 3. Unify j8 and j11 workflow pairs into single (for 2 and 3 see: 
> https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595)
> 4. Update documentation w/guidance on using circle, .circleci/generate.sh 
> examples, etc
> 4a. How to commit: 
> https://cassandra.apache.org/_/development/how_to_commit.html
> 4b. Testing: https://cassandra.apache.org/_/development/testing.html
> 5. Flag on generate.sh to allow auto-run on push
> 6. Clean up the -l, -m, -h flags (test and indicate -l feasibility for all 
> suites, default to -m, deprecate -h?) <- may not be a code-change issue and 
> instead be a documentation issue
> 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE] 
> temporary circleci config" as the commit message
> 
> Curious to see what folks think.
> 
> ~Josh
> 

Reply via email to