+1 to all of these, especially improving CircleCI generation and
ergonomics. I still have a bunch of reservations around CircleCI in
general, but in the short term we can make it less painful (to a point).

Cheers,

Derek

On Thu, Oct 20, 2022 at 6:38 AM Ekaterina Dimitrova <e.dimitr...@gmail.com>
wrote:

> Yes, they do. This is the only test suite that gets max resources with -m.
> Probably you had some other issue Berenguer as I can confirm I was running
> them successfully these days
>
> On Thu, 20 Oct 2022 at 6:54, Brandon Williams <dri...@gmail.com> wrote:
>
>> They passed with -m for me recently.
>>
>> Kind Regards,
>> Brandon
>>
>> On Thu, Oct 20, 2022 at 12:03 AM Berenguer Blasi
>> <berenguerbl...@gmail.com> wrote:
>> >
>> > Can python upgrade tests be ran without -h? Last time I tried iirc they
>> fail on -m
>> >
>> > On 20/10/22 4:11, Ekaterina Dimitrova wrote:
>> >
>> > Thank you Josh. Glad to see that our CI is getting more attention. As
>> no Cassandra feature will be there if we don't do proper testing, right?
>> Important as all the suites and tools we have. With that being said I am
>> glad to see Derek is volunteering to spend more time on this as I believe
>> this is always the main issue - ideas and willingness for improvements are
>> there but people are swamped with other things and we lack manpower for
>> something so important.
>> > 1. Tune parallelism levels per job (David and Ekaterina have insight on
>> this)
>> > Question for David, do you tune only parallelism and use only xlarge?
>> If yes, we need to talk :D
>> > Reading what Stefan shared as experience/feedback, I think we can
>> revise the current config and move to a more reasonable config that can
>> work for most people but there will always be someone who needs something a
>> bit different. With that said maybe we can add to our scripts/menu an
>> option to change from command line through parameters parallelism and/or
>> resources? For those who want further customization? I see this as a
>> separate additional ticket probably. In that case we might probably skip
>> the use of circleci config process for that part of the menu. (but not for
>> adding new jobs and meaningful permanent updates)
>> > 2. Rename jobs on circle to be more indicative of their function
>> > +0 I am probably super used to the current names but Derek brought it
>> to my attention that there are names which are confusing for someone new to
>> the cassandra world. With that said I would say we can do this in a
>> separate ticket, mass update.
>> > 3. Unify j8 and j11 workflow pairs into single (for 2 and 3 see:
>> https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595
>> )
>> > I am against unifying per JDK workflows but I am all in for unifying
>> the pre-commit/separate workflows and getting back to 2 workflows as
>> suggested by Andres. If we think of how that will look in the UI I think it
>> will be super hard to follow. (the case of having unified both jdks in one
>> workflow)
>> > 4. Update documentation w/guidance on using circle,
>> .circleci/generate.sh examples, etc 4a. How to commit:
>> https://cassandra.apache.org/_/development/how_to_commit.html 4b.
>> Testing: https://cassandra.apache.org/_/development/testing.html
>> > I will open a ticket and post the guide I was working on. But it also
>> doesn't make sense to fully update it now if we are going to significantly
>> change the workflow soon. Until then I believe Andres has updated the
>> circleci readme and provided good usage examples.
>> > 5. Flag on generate.sh to allow auto-run on push
>> > Auto-run on push? Can you elaborate? Like to start your whole workflow
>> directly without using the UI? There is an approval step in the config
>> file, we can probably add some flags to change pre-commit workflows to
>> start build without approval when we use those mentioned flags. But having
>> by default everything to start on push is an overkill in my opinion. People
>> will be forgetting it and pushing builds for nothing on WIP branches.
>> Talking from experience :D
>> > 6. Clean up the -l, -m, -h flags (test and indicate -l feasibility for
>> all suites, default to -m, deprecate -h?) <- may not be a code-change issue
>> and instead be a documentation issue
>> > If we agree except the free tier config file we want one more
>> reasonable config which doesn't bump resources to the max without a need
>> but provides balanced use of resources - absolutely. -h was kept as there
>> was understanding there are people in the community actively using it.
>> > 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE]
>> temporary circleci config" as the commit message
>> > +0
>> > I also wanted to address a few of the points David made.
>> > "Ekaterina is probably dealing with with her JDK17 work" - if you mean
>> to ensure we have all jobs for all jdks properly, yes. That was my plan.
>> Until Derek was so good at suggesting to work on adding missing jobs in
>> CircleCI now so my work on that will be a bit less for certain things. This
>> is an effort related to the recent changes in our release document. Ticket
>> CASSANDRA-17950 :-) I am helping with mentoring/reviews. Everyone is
>> welcome to join the party.
>> > "1) resource_class used is not because its needed… in HIGHER file we
>> default to xlarge but only python upgrade tests need that… reported in
>> CASSANDRA-17600" - one of the reasons. we had the MIDRES in the first place
>> as I mentioned in my other email the other day. [1]
>> >
>> > "our current patching allows MID/HIGHER to drift as changes need new
>> patches else patching may do the wrong thing… reported in CASSANDRA-17600"
>> - I'd say the patching is annoying sometimes, indeed but with/without the
>> patching any changes to config mean we need to check it by reading through
>> diff and pushing a run to CI before commit. With that said I am all in for
>> automation but this will not change the fact we need to push test runs and
>> verify the changes did not hurt us in a way. Same as testing patches on all
>> branches, running all needed tests and confirming no regressions. Nothing
>> new or changing here IMHO
>> >
>> > "CI is a combinatorial problem, we need to run all jobs for all JDKs,
>> vnode on/of, cdc on/off, compression on/of, etc…. But this is currently
>> controlled and fleshed out by humans who want to add new jobs.. we should
>> move away from maintaining .circleci/config-2_1.yml and instead
>> auto-generate it. Simple example of this problem is jdk11 support… we run a
>> subset of tests on jdk11 and say its supported… will jdk17 have the same
>> issue? Will it be even less tests? Why does the burden lie on everyone to
>> “do the right thing” when all they want is a simple job?"
>> >  Controlled and fleshed by humans it will always be but I agree we need
>> to automate the steps to make it easier for people to add most of the
>> combinations and not to skip any because it is too much work. We will
>> always need a human to decide which jdks, cdc, vnodes, etc. With that said
>> I shared your ticket/patch with Derek as he had similar thoughts, we need
>> to get back to that one at some point. (CASSANDRA-17600) Thanks for working
>> on that!
>> >
>> > "why do we require people to install “circleci” command to contribute?
>> If you rename .circleci/config-2_1.yml to .circleci/config.yml then CI will
>> work just fine… we don’t need to call “circleci config process” every time
>> we touch circle config…. Also, seems that w/e someone new to circle config
>> (but not cassandra) touch it they always mutate LOW/MID/HIGH and not
>> .circleci/config-2_1.yml… so I keep going back to fix
>> .circleci/config-2_1.yml…."
>> > I'd say config-2_1.yml is mainly for those who will make permanent
>> changes to config (like adding/removing jobs). config-2_1.yml is actually
>> created as per the CircleCI automation rules - 1st we add and reuse
>> executors, parameters and commands but I think we can reduce further things
>> if we add even more parameters probably. I have to look more into the
>> current file. I am sure there is room for further improvement. 2nd circleci
>> cli tool can verify the config file for errors and helps with debugging
>> before we push to CircleCI. There is circleci config validate. If we make
>> changes manually we are on our own to verify the long yml and also deal
>> with duplication in config.yml. My concern is that things that need to be
>> almost identical might start to diverge easier. Though I made my suggestion
>> in point 1 for what cases probably we can add menu options that potentially
>> will not require using circleci cli tool. There might be more cases though.
>> > Currently config-2_1.yml is 2256 lines while config.yml is 5793 lines.
>> I'd say lots of duplication there
>> >
>> > [1] https://lists.apache.org/thread/htxoh60zt8zxc4vgxj9zh71trk0zxwhl
>> >
>> > On Wed, 19 Oct 2022 at 17:20, David Capwell <dcapw...@apple.com> wrote:
>> >>
>> >> 1. Tune parallelism levels per job (David and Ekaterina have insight
>> on this)
>> >>
>> >>
>> >> +1 to this!  I drastically lower our parallelism as only python-dtest
>> upgrade tests need many resources…
>> >>
>> >> What I do for JVM unit/jvm-dtest is the following
>> >>
>> >> def java_parallelism(src_dir, kind, num_file_in_worker, include =
>> lambda a, b: True):
>> >>     d = os.path.join(src_dir, 'test', kind)
>> >>     num_files = 0
>> >>     for root, dirs, files in os.walk(d):
>> >>         for f in files:
>> >>             if f.endswith('Test.java') and include(os.path.join(root,
>> f), f):
>> >>                 num_files += 1
>> >>     return math.floor(num_files / num_file_in_worker)
>> >>
>> >> def fix_parallelism(args, contents):
>> >>     jobs = contents['jobs']
>> >>
>> >>     unit_parallelism                = java_parallelism(args.src,
>> 'unit', 20)
>> >>     jvm_dtest_parallelism           = java_parallelism(args.src,
>> 'distributed', 4, lambda full, name: 'upgrade' not in full)
>> >>     jvm_dtest_upgrade_parallelism   = java_parallelism(args.src,
>> 'distributed', 2, lambda full, name: 'upgrade' in full)
>> >>
>> >> TL;DR - I find all test files we are going to run, and based off a
>> pre-defined variable that says “idea” number of files per worker, I then
>> calculate how many workers we need.  So unit tests are num_files / 20 ~= 35
>> workers.  Can I be “smarter” by knowing which files have higher cost?
>> Sure… but the “perfect” and the “average” are too similar that it wasn’t
>> worth it...
>> >>
>> >> 2. Rename jobs on circle to be more indicative of their function
>> >>
>> >>
>> >> Have an example?  I am not against, I just don’t know the problem you
>> are referring to.
>> >>
>> >> 3. Unify j8 and j11 workflow pairs into single
>> >>
>> >>
>> >> Fine by me, but we need to keep in mind j17 is coming.  Also, most
>> developmental CI builds don’t really need to run cross every JDK, so we
>> need some way to disable different JDKs…
>> >>
>> >> When I am testing out a patch I tend to run the following (my script):
>> "circleci-enable.py --no-jdk11”; this will remove the JDK11 builds.  I know
>> I am going to run them pre-merge so I know its safe for me.
>> >>
>> >> 5. Flag on generate.sh to allow auto-run on push
>> >>
>> >>
>> >> I really hate that we don’t do this by default… I still to this day
>> strongly feel you should opt-out of CI rather than opt-in… seen several
>> commits get merged as they didn’t see a error in circle… because circle
>> didn’t do any work…. Yes, I am fully aware that I am beating a dead horse…
>> >>
>> >> TL;DR +1
>> >>
>> >> 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE]
>> temporary circleci config" as the commit message
>> >>
>> >>
>> >> +0 from me… I have seen people not realize you have to commit after
>> typing “higher” (wrapper around my personal circleci-enable.py script to
>> apply my defaults to the build) but not an issue I have… so I don’t mind if
>> people want the tool to integrate with git…
>> >>
>> >>
>> >> With all that said, I do feel there is more, and something I feel
>> Ekaterina is probably dealing with with her JDK17 work…
>> >>
>> >> 1) resource_class used is not because its needed… in HIGHER file we
>> default to xlarge but only python upgrade tests need that… reported in
>> CASSANDRA-17600
>> >> 2) our current patching allows MID/HIGHER to drift as changes need new
>> patches else patching may do the wrong thing… reported in CASSANDRA-17600
>> >> 3) CI is a combinatorial problem, we need to run all jobs for all
>> JDKs, vnode on/of, cdc on/off, compression on/of, etc…. But this is
>> currently controlled and fleshed out by humans who want to add new jobs..
>> we should move away from maintaining .circleci/config-2_1.yml and instead
>> auto-generate it.  Simple example of this problem is jdk11 support… we run
>> a subset of tests on jdk11 and say its supported… will jdk17 have the same
>> issue?  Will it be even less tests?  Why does the burden lie on everyone to
>> “do the right thing” when all they want is a simple job?
>> >> 4) why do we require people to install “circleci” command to
>> contribute?  If you rename .circleci/config-2_1.yml to .circleci/config.yml
>> then CI will work just fine… we don’t need to call “circleci config
>> process” every time we touch circle config…. Also, seems that w/e someone
>> new to circle config (but not cassandra) touch it they always mutate
>> LOW/MID/HIGH and not .circleci/config-2_1.yml… so I keep going back to fix
>> .circleci/config-2_1.yml….
>> >>
>> >>
>> >> On Oct 19, 2022, at 1:32 PM, Miklosovic, Stefan <
>> stefan.mikloso...@netapp.com> wrote:
>> >>
>> >> 1) would be nice to have. The first thing I do is that I change the
>> parallelism to 20. None of committed config.yaml's are appropriate for our
>> company CircleCI so I have to tweak this manually. I think we can not run
>> more that 25/30 containers in parallel, something like that. HIGHRES has
>> 100 and MIDRES has some jobs having parallelism equal to 50 or so so that
>> is not good either. I would be happy with simple way to modify default
>> config.yaml on parallelism. I use "sed" to change parallelism: 4 to
>> parallelism: 20 and leave parallelism: 1 where it does not make sense to
>> increase it. However I noticed that there is not "4" set everywhere, some
>> jobs have it set to "1" so I have to take extra care of these cases (I
>> consider that to be a bug, I think there are two or three, I do not
>> remember). Once set, I have that config in "git stash" so I just apply it
>> every time I need it.
>> >>
>> >> 5) would be nice too.
>> >> 7) is nice but not crucial, it takes no time to commit that.
>> >>
>> >> ________________________________________
>> >> From: Josh McKenzie <jmcken...@apache.org>
>> >> Sent: Wednesday, October 19, 2022 21:50
>> >> To: dev
>> >> Subject: [DISCUSS] Potential circleci config and workflow changes
>> >>
>> >> NetApp Security WARNING: This is an external email. Do not click links
>> or open attachments unless you recognize the sender and know the content is
>> safe.
>> >>
>> >>
>> >>
>> >> While working w/Andres on CASSANDRA-17939 a variety of things came up
>> regarding our circleci config and opportunities to improve it. Figured I'd
>> hit the list up here to see what people's thoughts are since many of us
>> intersect with these systems daily and having your workflow disrupted
>> without having a chance to provide input is bad.
>> >>
>> >> The ideas:
>> >> 1. Tune parallelism levels per job (David and Ekaterina have insight
>> on this)
>> >> 2. Rename jobs on circle to be more indicative of their function
>> >> 3. Unify j8 and j11 workflow pairs into single (for 2 and 3 see:
>> https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595
>> )
>> >> 4. Update documentation w/guidance on using circle,
>> .circleci/generate.sh examples, etc
>> >> 4a. How to commit:
>> https://cassandra.apache.org/_/development/how_to_commit.html
>> >> 4b. Testing: https://cassandra.apache.org/_/development/testing.html
>> >> 5. Flag on generate.sh to allow auto-run on push
>> >> 6. Clean up the -l, -m, -h flags (test and indicate -l feasibility for
>> all suites, default to -m, deprecate -h?) <- may not be a code-change issue
>> and instead be a documentation issue
>> >> 7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE]
>> temporary circleci config" as the commit message
>> >>
>> >> Curious to see what folks think.
>> >>
>> >> ~Josh
>> >>
>> >>
>>
>

-- 
+---------------------------------------------------------------+
| Derek Chen-Becker                                             |
| GPG Key available at https://keybase.io/dchenbecker and       |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---------------------------------------------------------------+

Reply via email to