Re: [DISCUSS] Potential circleci config and workflow changes

Brandon Williams Thu, 20 Oct 2022 03:53:54 -0700

They passed with -m for me recently.

Kind Regards,
Brandon


On Thu, Oct 20, 2022 at 12:03 AM Berenguer Blasi
<berenguerbl...@gmail.com> wrote:
>
> Can python upgrade tests be ran without -h? Last time I tried iirc they fail 
> on -m
>
> On 20/10/22 4:11, Ekaterina Dimitrova wrote:
>
> Thank you Josh. Glad to see that our CI is getting more attention. As no 
> Cassandra feature will be there if we don't do proper testing, right? 
> Important as all the suites and tools we have. With that being said I am glad 
> to see Derek is volunteering to spend more time on this as I believe this is 
> always the main issue - ideas and willingness for improvements are there but 
> people are swamped with other things and we lack manpower for something so 
> important.
> 1. Tune parallelism levels per job (David and Ekaterina have insight on this)
> Question for David, do you tune only parallelism and use only xlarge? If yes, 
> we need to talk :D
> Reading what Stefan shared as experience/feedback, I think we can revise the 
> current config and move to a more reasonable config that can work for most 
> people but there will always be someone who needs something a bit different. 
> With that said maybe we can add to our scripts/menu an option to change from 
> command line through parameters parallelism and/or resources? For those who 
> want further customization? I see this as a separate additional ticket 
> probably. In that case we might probably skip the use of circleci config 
> process for that part of the menu. (but not for adding new jobs and 
> meaningful permanent updates)
> 2. Rename jobs on circle to be more indicative of their function
> +0 I am probably super used to the current names but Derek brought it to my 
> attention that there are names which are confusing for someone new to the 
> cassandra world. With that said I would say we can do this in a separate 
> ticket, mass update.
> 3. Unify j8 and j11 workflow pairs into single (for 2 and 3 see: 
> https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595)
> I am against unifying per JDK workflows but I am all in for unifying the 
> pre-commit/separate workflows and getting back to 2 workflows as suggested by 
> Andres. If we think of how that will look in the UI I think it will be super 
> hard to follow. (the case of having unified both jdks in one workflow)
> 4. Update documentation w/guidance on using circle, .circleci/generate.sh 
> examples, etc 4a. How to commit: 
> https://cassandra.apache.org/_/development/how_to_commit.html 4b. Testing: 
> https://cassandra.apache.org/_/development/testing.html
> I will open a ticket and post the guide I was working on. But it also doesn't 
> make sense to fully update it now if we are going to significantly change the 
> workflow soon. Until then I believe Andres has updated the circleci readme 
> and provided good usage examples.
> 5. Flag on generate.sh to allow auto-run on push
> Auto-run on push? Can you elaborate? Like to start your whole workflow 
> directly without using the UI? There is an approval step in the config file, 
> we can probably add some flags to change pre-commit workflows to start build 
> without approval when we use those mentioned flags. But having by default 
> everything to start on push is an overkill in my opinion. People will be 
> forgetting it and pushing builds for nothing on WIP branches. Talking from 
> experience :D
> 6. Clean up the -l, -m, -h flags (test and indicate -l feasibility for all 
> suites, default to -m, deprecate -h?) <- may not be a code-change issue and 
> instead be a documentation issue
> If we agree except the free tier config file we want one more reasonable 
> config which doesn't bump resources to the max without a need but provides 
> balanced use of resources - absolutely. -h was kept as there was 
> understanding there are people in the community actively using it.
> 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE] 
> temporary circleci config" as the commit message
> +0
> I also wanted to address a few of the points David made.
> "Ekaterina is probably dealing with with her JDK17 work" - if you mean to 
> ensure we have all jobs for all jdks properly, yes. That was my plan. Until 
> Derek was so good at suggesting to work on adding missing jobs in CircleCI 
> now so my work on that will be a bit less for certain things. This is an 
> effort related to the recent changes in our release document. Ticket 
> CASSANDRA-17950 :-) I am helping with mentoring/reviews. Everyone is welcome 
> to join the party.
> "1) resource_class used is not because its needed… in HIGHER file we default 
> to xlarge but only python upgrade tests need that… reported in 
> CASSANDRA-17600" - one of the reasons. we had the MIDRES in the first place 
> as I mentioned in my other email the other day. [1]
>
> "our current patching allows MID/HIGHER to drift as changes need new patches 
> else patching may do the wrong thing… reported in CASSANDRA-17600" - I'd say 
> the patching is annoying sometimes, indeed but with/without the patching any 
> changes to config mean we need to check it by reading through diff and 
> pushing a run to CI before commit. With that said I am all in for automation 
> but this will not change the fact we need to push test runs and verify the 
> changes did not hurt us in a way. Same as testing patches on all branches, 
> running all needed tests and confirming no regressions. Nothing new or 
> changing here IMHO
>
> "CI is a combinatorial problem, we need to run all jobs for all JDKs, vnode 
> on/of, cdc on/off, compression on/of, etc…. But this is currently controlled 
> and fleshed out by humans who want to add new jobs.. we should move away from 
> maintaining .circleci/config-2_1.yml and instead auto-generate it. Simple 
> example of this problem is jdk11 support… we run a subset of tests on jdk11 
> and say its supported… will jdk17 have the same issue? Will it be even less 
> tests? Why does the burden lie on everyone to “do the right thing” when all 
> they want is a simple job?"
>  Controlled and fleshed by humans it will always be but I agree we need to 
> automate the steps to make it easier for people to add most of the 
> combinations and not to skip any because it is too much work. We will always 
> need a human to decide which jdks, cdc, vnodes, etc. With that said I shared 
> your ticket/patch with Derek as he had similar thoughts, we need to get back 
> to that one at some point. (CASSANDRA-17600) Thanks for working on that!
>
> "why do we require people to install “circleci” command to contribute? If you 
> rename .circleci/config-2_1.yml to .circleci/config.yml then CI will work 
> just fine… we don’t need to call “circleci config process” every time we 
> touch circle config…. Also, seems that w/e someone new to circle config (but 
> not cassandra) touch it they always mutate LOW/MID/HIGH and not 
> .circleci/config-2_1.yml… so I keep going back to fix 
> .circleci/config-2_1.yml…."
> I'd say config-2_1.yml is mainly for those who will make permanent changes to 
> config (like adding/removing jobs). config-2_1.yml is actually created as per 
> the CircleCI automation rules - 1st we add and reuse executors, parameters 
> and commands but I think we can reduce further things if we add even more 
> parameters probably. I have to look more into the current file. I am sure 
> there is room for further improvement. 2nd circleci cli tool can verify the 
> config file for errors and helps with debugging before we push to CircleCI. 
> There is circleci config validate. If we make changes manually we are on our 
> own to verify the long yml and also deal with duplication in config.yml. My 
> concern is that things that need to be almost identical might start to 
> diverge easier. Though I made my suggestion in point 1 for what cases 
> probably we can add menu options that potentially will not require using 
> circleci cli tool. There might be more cases though.
> Currently config-2_1.yml is 2256 lines while config.yml is 5793 lines. I'd 
> say lots of duplication there
>
> [1] https://lists.apache.org/thread/htxoh60zt8zxc4vgxj9zh71trk0zxwhl
>
> On Wed, 19 Oct 2022 at 17:20, David Capwell <dcapw...@apple.com> wrote:
>>
>> 1. Tune parallelism levels per job (David and Ekaterina have insight on this)
>>
>>
>> +1 to this!  I drastically lower our parallelism as only python-dtest 
>> upgrade tests need many resources…
>>
>> What I do for JVM unit/jvm-dtest is the following
>>
>> def java_parallelism(src_dir, kind, num_file_in_worker, include = lambda a, 
>> b: True):
>>     d = os.path.join(src_dir, 'test', kind)
>>     num_files = 0
>>     for root, dirs, files in os.walk(d):
>>         for f in files:
>>             if f.endswith('Test.java') and include(os.path.join(root, f), f):
>>                 num_files += 1
>>     return math.floor(num_files / num_file_in_worker)
>>
>> def fix_parallelism(args, contents):
>>     jobs = contents['jobs']
>>
>>     unit_parallelism                = java_parallelism(args.src, 'unit', 20)
>>     jvm_dtest_parallelism           = java_parallelism(args.src, 
>> 'distributed', 4, lambda full, name: 'upgrade' not in full)
>>     jvm_dtest_upgrade_parallelism   = java_parallelism(args.src, 
>> 'distributed', 2, lambda full, name: 'upgrade' in full)
>>
>> TL;DR - I find all test files we are going to run, and based off a 
>> pre-defined variable that says “idea” number of files per worker, I then 
>> calculate how many workers we need.  So unit tests are num_files / 20 ~= 35 
>> workers.  Can I be “smarter” by knowing which files have higher cost?  Sure… 
>> but the “perfect” and the “average” are too similar that it wasn’t worth 
>> it...
>>
>> 2. Rename jobs on circle to be more indicative of their function
>>
>>
>> Have an example?  I am not against, I just don’t know the problem you are 
>> referring to.
>>
>> 3. Unify j8 and j11 workflow pairs into single
>>
>>
>> Fine by me, but we need to keep in mind j17 is coming.  Also, most 
>> developmental CI builds don’t really need to run cross every JDK, so we need 
>> some way to disable different JDKs…
>>
>> When I am testing out a patch I tend to run the following (my script): 
>> "circleci-enable.py --no-jdk11”; this will remove the JDK11 builds.  I know 
>> I am going to run them pre-merge so I know its safe for me.
>>
>> 5. Flag on generate.sh to allow auto-run on push
>>
>>
>> I really hate that we don’t do this by default… I still to this day strongly 
>> feel you should opt-out of CI rather than opt-in… seen several commits get 
>> merged as they didn’t see a error in circle… because circle didn’t do any 
>> work…. Yes, I am fully aware that I am beating a dead horse…
>>
>> TL;DR +1
>>
>> 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE] 
>> temporary circleci config" as the commit message
>>
>>
>> +0 from me… I have seen people not realize you have to commit after typing 
>> “higher” (wrapper around my personal circleci-enable.py script to apply my 
>> defaults to the build) but not an issue I have… so I don’t mind if people 
>> want the tool to integrate with git…
>>
>>
>> With all that said, I do feel there is more, and something I feel Ekaterina 
>> is probably dealing with with her JDK17 work…
>>
>> 1) resource_class used is not because its needed… in HIGHER file we default 
>> to xlarge but only python upgrade tests need that… reported in 
>> CASSANDRA-17600
>> 2) our current patching allows MID/HIGHER to drift as changes need new 
>> patches else patching may do the wrong thing… reported in CASSANDRA-17600
>> 3) CI is a combinatorial problem, we need to run all jobs for all JDKs, 
>> vnode on/of, cdc on/off, compression on/of, etc…. But this is currently 
>> controlled and fleshed out by humans who want to add new jobs..  we should 
>> move away from maintaining .circleci/config-2_1.yml and instead 
>> auto-generate it.  Simple example of this problem is jdk11 support… we run a 
>> subset of tests on jdk11 and say its supported… will jdk17 have the same 
>> issue?  Will it be even less tests?  Why does the burden lie on everyone to 
>> “do the right thing” when all they want is a simple job?
>> 4) why do we require people to install “circleci” command to contribute?  If 
>> you rename .circleci/config-2_1.yml to .circleci/config.yml then CI will 
>> work just fine… we don’t need to call “circleci config process” every time 
>> we touch circle config…. Also, seems that w/e someone new to circle config 
>> (but not cassandra) touch it they always mutate LOW/MID/HIGH and not 
>> .circleci/config-2_1.yml… so I keep going back to fix 
>> .circleci/config-2_1.yml….
>>
>>
>> On Oct 19, 2022, at 1:32 PM, Miklosovic, Stefan 
>> <stefan.mikloso...@netapp.com> wrote:
>>
>> 1) would be nice to have. The first thing I do is that I change the 
>> parallelism to 20. None of committed config.yaml's are appropriate for our 
>> company CircleCI so I have to tweak this manually. I think we can not run 
>> more that 25/30 containers in parallel, something like that. HIGHRES has 100 
>> and MIDRES has some jobs having parallelism equal to 50 or so so that is not 
>> good either. I would be happy with simple way to modify default config.yaml 
>> on parallelism. I use "sed" to change parallelism: 4 to parallelism: 20 and 
>> leave parallelism: 1 where it does not make sense to increase it. However I 
>> noticed that there is not "4" set everywhere, some jobs have it set to "1" 
>> so I have to take extra care of these cases (I consider that to be a bug, I 
>> think there are two or three, I do not remember). Once set, I have that 
>> config in "git stash" so I just apply it every time I need it.
>>
>> 5) would be nice too.
>> 7) is nice but not crucial, it takes no time to commit that.
>>
>> ________________________________________
>> From: Josh McKenzie <jmcken...@apache.org>
>> Sent: Wednesday, October 19, 2022 21:50
>> To: dev
>> Subject: [DISCUSS] Potential circleci config and workflow changes
>>
>> NetApp Security WARNING: This is an external email. Do not click links or 
>> open attachments unless you recognize the sender and know the content is 
>> safe.
>>
>>
>>
>> While working w/Andres on CASSANDRA-17939 a variety of things came up 
>> regarding our circleci config and opportunities to improve it. Figured I'd 
>> hit the list up here to see what people's thoughts are since many of us 
>> intersect with these systems daily and having your workflow disrupted 
>> without having a chance to provide input is bad.
>>
>> The ideas:
>> 1. Tune parallelism levels per job (David and Ekaterina have insight on this)
>> 2. Rename jobs on circle to be more indicative of their function
>> 3. Unify j8 and j11 workflow pairs into single (for 2 and 3 see: 
>> https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595)
>> 4. Update documentation w/guidance on using circle, .circleci/generate.sh 
>> examples, etc
>> 4a. How to commit: 
>> https://cassandra.apache.org/_/development/how_to_commit.html
>> 4b. Testing: https://cassandra.apache.org/_/development/testing.html
>> 5. Flag on generate.sh to allow auto-run on push
>> 6. Clean up the -l, -m, -h flags (test and indicate -l feasibility for all 
>> suites, default to -m, deprecate -h?) <- may not be a code-change issue and 
>> instead be a documentation issue
>> 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE] 
>> temporary circleci config" as the commit message
>>
>> Curious to see what folks think.
>>
>> ~Josh
>>
>>

Re: [DISCUSS] Potential circleci config and workflow changes

Reply via email to