They passed with -m for me recently. Kind Regards, Brandon
On Thu, Oct 20, 2022 at 12:03 AM Berenguer Blasi <berenguerbl...@gmail.com> wrote: > > Can python upgrade tests be ran without -h? Last time I tried iirc they fail > on -m > > On 20/10/22 4:11, Ekaterina Dimitrova wrote: > > Thank you Josh. Glad to see that our CI is getting more attention. As no > Cassandra feature will be there if we don't do proper testing, right? > Important as all the suites and tools we have. With that being said I am glad > to see Derek is volunteering to spend more time on this as I believe this is > always the main issue - ideas and willingness for improvements are there but > people are swamped with other things and we lack manpower for something so > important. > 1. Tune parallelism levels per job (David and Ekaterina have insight on this) > Question for David, do you tune only parallelism and use only xlarge? If yes, > we need to talk :D > Reading what Stefan shared as experience/feedback, I think we can revise the > current config and move to a more reasonable config that can work for most > people but there will always be someone who needs something a bit different. > With that said maybe we can add to our scripts/menu an option to change from > command line through parameters parallelism and/or resources? For those who > want further customization? I see this as a separate additional ticket > probably. In that case we might probably skip the use of circleci config > process for that part of the menu. (but not for adding new jobs and > meaningful permanent updates) > 2. Rename jobs on circle to be more indicative of their function > +0 I am probably super used to the current names but Derek brought it to my > attention that there are names which are confusing for someone new to the > cassandra world. With that said I would say we can do this in a separate > ticket, mass update. > 3. Unify j8 and j11 workflow pairs into single (for 2 and 3 see: > https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595) > I am against unifying per JDK workflows but I am all in for unifying the > pre-commit/separate workflows and getting back to 2 workflows as suggested by > Andres. If we think of how that will look in the UI I think it will be super > hard to follow. (the case of having unified both jdks in one workflow) > 4. Update documentation w/guidance on using circle, .circleci/generate.sh > examples, etc 4a. How to commit: > https://cassandra.apache.org/_/development/how_to_commit.html 4b. Testing: > https://cassandra.apache.org/_/development/testing.html > I will open a ticket and post the guide I was working on. But it also doesn't > make sense to fully update it now if we are going to significantly change the > workflow soon. Until then I believe Andres has updated the circleci readme > and provided good usage examples. > 5. Flag on generate.sh to allow auto-run on push > Auto-run on push? Can you elaborate? Like to start your whole workflow > directly without using the UI? There is an approval step in the config file, > we can probably add some flags to change pre-commit workflows to start build > without approval when we use those mentioned flags. But having by default > everything to start on push is an overkill in my opinion. People will be > forgetting it and pushing builds for nothing on WIP branches. Talking from > experience :D > 6. Clean up the -l, -m, -h flags (test and indicate -l feasibility for all > suites, default to -m, deprecate -h?) <- may not be a code-change issue and > instead be a documentation issue > If we agree except the free tier config file we want one more reasonable > config which doesn't bump resources to the max without a need but provides > balanced use of resources - absolutely. -h was kept as there was > understanding there are people in the community actively using it. > 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE] > temporary circleci config" as the commit message > +0 > I also wanted to address a few of the points David made. > "Ekaterina is probably dealing with with her JDK17 work" - if you mean to > ensure we have all jobs for all jdks properly, yes. That was my plan. Until > Derek was so good at suggesting to work on adding missing jobs in CircleCI > now so my work on that will be a bit less for certain things. This is an > effort related to the recent changes in our release document. Ticket > CASSANDRA-17950 :-) I am helping with mentoring/reviews. Everyone is welcome > to join the party. > "1) resource_class used is not because its needed… in HIGHER file we default > to xlarge but only python upgrade tests need that… reported in > CASSANDRA-17600" - one of the reasons. we had the MIDRES in the first place > as I mentioned in my other email the other day. [1] > > "our current patching allows MID/HIGHER to drift as changes need new patches > else patching may do the wrong thing… reported in CASSANDRA-17600" - I'd say > the patching is annoying sometimes, indeed but with/without the patching any > changes to config mean we need to check it by reading through diff and > pushing a run to CI before commit. With that said I am all in for automation > but this will not change the fact we need to push test runs and verify the > changes did not hurt us in a way. Same as testing patches on all branches, > running all needed tests and confirming no regressions. Nothing new or > changing here IMHO > > "CI is a combinatorial problem, we need to run all jobs for all JDKs, vnode > on/of, cdc on/off, compression on/of, etc…. But this is currently controlled > and fleshed out by humans who want to add new jobs.. we should move away from > maintaining .circleci/config-2_1.yml and instead auto-generate it. Simple > example of this problem is jdk11 support… we run a subset of tests on jdk11 > and say its supported… will jdk17 have the same issue? Will it be even less > tests? Why does the burden lie on everyone to “do the right thing” when all > they want is a simple job?" > Controlled and fleshed by humans it will always be but I agree we need to > automate the steps to make it easier for people to add most of the > combinations and not to skip any because it is too much work. We will always > need a human to decide which jdks, cdc, vnodes, etc. With that said I shared > your ticket/patch with Derek as he had similar thoughts, we need to get back > to that one at some point. (CASSANDRA-17600) Thanks for working on that! > > "why do we require people to install “circleci” command to contribute? If you > rename .circleci/config-2_1.yml to .circleci/config.yml then CI will work > just fine… we don’t need to call “circleci config process” every time we > touch circle config…. Also, seems that w/e someone new to circle config (but > not cassandra) touch it they always mutate LOW/MID/HIGH and not > .circleci/config-2_1.yml… so I keep going back to fix > .circleci/config-2_1.yml…." > I'd say config-2_1.yml is mainly for those who will make permanent changes to > config (like adding/removing jobs). config-2_1.yml is actually created as per > the CircleCI automation rules - 1st we add and reuse executors, parameters > and commands but I think we can reduce further things if we add even more > parameters probably. I have to look more into the current file. I am sure > there is room for further improvement. 2nd circleci cli tool can verify the > config file for errors and helps with debugging before we push to CircleCI. > There is circleci config validate. If we make changes manually we are on our > own to verify the long yml and also deal with duplication in config.yml. My > concern is that things that need to be almost identical might start to > diverge easier. Though I made my suggestion in point 1 for what cases > probably we can add menu options that potentially will not require using > circleci cli tool. There might be more cases though. > Currently config-2_1.yml is 2256 lines while config.yml is 5793 lines. I'd > say lots of duplication there > > [1] https://lists.apache.org/thread/htxoh60zt8zxc4vgxj9zh71trk0zxwhl > > On Wed, 19 Oct 2022 at 17:20, David Capwell <dcapw...@apple.com> wrote: >> >> 1. Tune parallelism levels per job (David and Ekaterina have insight on this) >> >> >> +1 to this! I drastically lower our parallelism as only python-dtest >> upgrade tests need many resources… >> >> What I do for JVM unit/jvm-dtest is the following >> >> def java_parallelism(src_dir, kind, num_file_in_worker, include = lambda a, >> b: True): >> d = os.path.join(src_dir, 'test', kind) >> num_files = 0 >> for root, dirs, files in os.walk(d): >> for f in files: >> if f.endswith('Test.java') and include(os.path.join(root, f), f): >> num_files += 1 >> return math.floor(num_files / num_file_in_worker) >> >> def fix_parallelism(args, contents): >> jobs = contents['jobs'] >> >> unit_parallelism = java_parallelism(args.src, 'unit', 20) >> jvm_dtest_parallelism = java_parallelism(args.src, >> 'distributed', 4, lambda full, name: 'upgrade' not in full) >> jvm_dtest_upgrade_parallelism = java_parallelism(args.src, >> 'distributed', 2, lambda full, name: 'upgrade' in full) >> >> TL;DR - I find all test files we are going to run, and based off a >> pre-defined variable that says “idea” number of files per worker, I then >> calculate how many workers we need. So unit tests are num_files / 20 ~= 35 >> workers. Can I be “smarter” by knowing which files have higher cost? Sure… >> but the “perfect” and the “average” are too similar that it wasn’t worth >> it... >> >> 2. Rename jobs on circle to be more indicative of their function >> >> >> Have an example? I am not against, I just don’t know the problem you are >> referring to. >> >> 3. Unify j8 and j11 workflow pairs into single >> >> >> Fine by me, but we need to keep in mind j17 is coming. Also, most >> developmental CI builds don’t really need to run cross every JDK, so we need >> some way to disable different JDKs… >> >> When I am testing out a patch I tend to run the following (my script): >> "circleci-enable.py --no-jdk11”; this will remove the JDK11 builds. I know >> I am going to run them pre-merge so I know its safe for me. >> >> 5. Flag on generate.sh to allow auto-run on push >> >> >> I really hate that we don’t do this by default… I still to this day strongly >> feel you should opt-out of CI rather than opt-in… seen several commits get >> merged as they didn’t see a error in circle… because circle didn’t do any >> work…. Yes, I am fully aware that I am beating a dead horse… >> >> TL;DR +1 >> >> 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE] >> temporary circleci config" as the commit message >> >> >> +0 from me… I have seen people not realize you have to commit after typing >> “higher” (wrapper around my personal circleci-enable.py script to apply my >> defaults to the build) but not an issue I have… so I don’t mind if people >> want the tool to integrate with git… >> >> >> With all that said, I do feel there is more, and something I feel Ekaterina >> is probably dealing with with her JDK17 work… >> >> 1) resource_class used is not because its needed… in HIGHER file we default >> to xlarge but only python upgrade tests need that… reported in >> CASSANDRA-17600 >> 2) our current patching allows MID/HIGHER to drift as changes need new >> patches else patching may do the wrong thing… reported in CASSANDRA-17600 >> 3) CI is a combinatorial problem, we need to run all jobs for all JDKs, >> vnode on/of, cdc on/off, compression on/of, etc…. But this is currently >> controlled and fleshed out by humans who want to add new jobs.. we should >> move away from maintaining .circleci/config-2_1.yml and instead >> auto-generate it. Simple example of this problem is jdk11 support… we run a >> subset of tests on jdk11 and say its supported… will jdk17 have the same >> issue? Will it be even less tests? Why does the burden lie on everyone to >> “do the right thing” when all they want is a simple job? >> 4) why do we require people to install “circleci” command to contribute? If >> you rename .circleci/config-2_1.yml to .circleci/config.yml then CI will >> work just fine… we don’t need to call “circleci config process” every time >> we touch circle config…. Also, seems that w/e someone new to circle config >> (but not cassandra) touch it they always mutate LOW/MID/HIGH and not >> .circleci/config-2_1.yml… so I keep going back to fix >> .circleci/config-2_1.yml…. >> >> >> On Oct 19, 2022, at 1:32 PM, Miklosovic, Stefan >> <stefan.mikloso...@netapp.com> wrote: >> >> 1) would be nice to have. The first thing I do is that I change the >> parallelism to 20. None of committed config.yaml's are appropriate for our >> company CircleCI so I have to tweak this manually. I think we can not run >> more that 25/30 containers in parallel, something like that. HIGHRES has 100 >> and MIDRES has some jobs having parallelism equal to 50 or so so that is not >> good either. I would be happy with simple way to modify default config.yaml >> on parallelism. I use "sed" to change parallelism: 4 to parallelism: 20 and >> leave parallelism: 1 where it does not make sense to increase it. However I >> noticed that there is not "4" set everywhere, some jobs have it set to "1" >> so I have to take extra care of these cases (I consider that to be a bug, I >> think there are two or three, I do not remember). Once set, I have that >> config in "git stash" so I just apply it every time I need it. >> >> 5) would be nice too. >> 7) is nice but not crucial, it takes no time to commit that. >> >> ________________________________________ >> From: Josh McKenzie <jmcken...@apache.org> >> Sent: Wednesday, October 19, 2022 21:50 >> To: dev >> Subject: [DISCUSS] Potential circleci config and workflow changes >> >> NetApp Security WARNING: This is an external email. Do not click links or >> open attachments unless you recognize the sender and know the content is >> safe. >> >> >> >> While working w/Andres on CASSANDRA-17939 a variety of things came up >> regarding our circleci config and opportunities to improve it. Figured I'd >> hit the list up here to see what people's thoughts are since many of us >> intersect with these systems daily and having your workflow disrupted >> without having a chance to provide input is bad. >> >> The ideas: >> 1. Tune parallelism levels per job (David and Ekaterina have insight on this) >> 2. Rename jobs on circle to be more indicative of their function >> 3. Unify j8 and j11 workflow pairs into single (for 2 and 3 see: >> https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595) >> 4. Update documentation w/guidance on using circle, .circleci/generate.sh >> examples, etc >> 4a. How to commit: >> https://cassandra.apache.org/_/development/how_to_commit.html >> 4b. Testing: https://cassandra.apache.org/_/development/testing.html >> 5. Flag on generate.sh to allow auto-run on push >> 6. Clean up the -l, -m, -h flags (test and indicate -l feasibility for all >> suites, default to -m, deprecate -h?) <- may not be a code-change issue and >> instead be a documentation issue >> 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE] >> temporary circleci config" as the commit message >> >> Curious to see what folks think. >> >> ~Josh >> >>