A proposal for refactoring the CircleCI config

Derek Chen-Becker Fri, 28 Oct 2022 14:26:34 -0700

While I've been working on bringing CircleCI to parity with Jenkins,
I've made some notes about ways that the whole config generation
process could be improved. Here are my thoughts. I'm not sure if this
is worthy of a CEP since it's infra and not a feature.


Cheers,

Derek

Problem Statement
═════════════════

  The CircleCI configuration subdivides various steps in the test plan
  into jobs that can execute independently. This set of jobs are
  intended to be run by developers under the free/OSS plan as well as
  under paid CircleCI plans for developers or organizations that wish to
  spend money to obtain faster test results by committing more resources
  (LOWRES, MIDRES, HIGHRES).

  The allocation of resources is currently driven by a shell script that
  performs textual modification (e.g. patch and sed) of files before
  processing to effect changes in various configuration parameters.
  While this approach works, it does not fully utilize the
  parameterization features of CircleCI’s configuration processor to
  reduce complexity when adding new tests or making changes to the
  system, imposing additional burden on developers modifying the CI
  configuration.

  This proposal details an initial goal for reducing CircleCI
  configuration complexity, and provides a high level overview of
  subsequent goals to be investigated.


Goal 1: Eliminate patch files
═════════════════════════════

  Patch files are targeted as the first goal for this proposal because
  there is a significant reduction in configuration complexity for a
  relatively modest effort. Patch files themselves are brittle; the
  patch tool can accommodate some changes between the original target
  file and the current state, but cannot also unambiguously apply
  changes. When the CircleCI configuration is changed, the patch files
  also need to be changed or regenerated to match line numbers and any
  new sections added. This is extra work that does not provide any
  benefit.

  The patch files currently apply changes to three types of
  configuration:

  • Heap size parameters (only for the HIGHRES config)
  • Job resource class
  • Executor parallelism

  CircleCI handles this use case via parameterization of the
  configuration. Interestingly, our CircleCI configuration already takes
  advantage of parameterization in the definition of the executor:

  ┌────
  │ java8-executor:
  │   parameters:
  │     exec_resource_class:
  │       type: string
  └────

  CircleCI additionally allows for parameters to be defined at the top
  level of the pipeline, which are then accessible anywhere in the
  pipeline definition (e.g. steps, jobs, etc). These parameters can be
  overridden by providing a yaml file to the `circleci config process'
  command.

  As an example of what a change would entail, consider that the patch
  files change the parallelism of all repated dtest executors uniformly.
  We could introduce a single pipeline parameter for this value:

  ┌────
  │ parameters:
  │   repeated_dtest_parallelism:
  │     type: integer
  │     default: 4
  └────

  And then update the configuration of the executors to use the
  parameter:

  ┌────
  │ j8_repeated_utest_executor: &j8_repeated_utest_executor
  │   executor:
  │     name: java8-executor
  │   parallelism: << pipeline.parameters.repeated_dtest_parallelism >>
  │
  │ j8_repeated_dtest_executor: &j8_repeated_dtest_executor
  │   executor:
  │     name: java8-executor
  │   parallelism: << pipeline.parameters.repeated_dtest_parallelism >>
  │
  │ j8_repeated_upgrade_dtest_executor: &j8_repeated_upgrade_dtest_executor
  │   executor:
  │     name: java8-executor
  │   parallelism: << pipeline.parameters.repeated_dtest_parallelism >>
  │
  │ j8_repeated_jvm_upgrade_dtest_executor:
&j8_repeated_jvm_upgrade_dtest_executor
  │   executor:
  │     name: java8-executor
  │   parallelism: << pipeline.parameters.repeated_dtest_parallelism >>
  └────

  Then we create a MIDRES-specific override file containing the new
  value:

  ┌────
  │ repeated_dtest_parallelism: 25
  └────

  and a HIGHRES-specific override file:

  ┌────
  │ repeated_dtest_parallelism: 100
  └────

  And then execute the config processor with the proper override:

  ┌────
  │ circleci config process --pipeline-parameters MIDRES-params.yml
config-2_1.yml
  │ # or
  │ circleci config process --pipeline-parameters HIGHRES-params.yml
config-2_1.yml
  └────

  If a new job is added that does not use any parameters, only the
  `config-2_1.yml' file is modified. If new parameters are introduced,
  then the override files also need updated. One option (although not
  recommended) to this approach to help catch missed values for MID and
  HIGH profiles would be to preclude the use of default values and
  provide a LOWRES-params.yml file with the default values.

  One caveat is that there was a bug in the CircleCI CLI prior to
  version 0.1.22322 that resulted in the override file being ignored.
  One possible workaround, if we do not want to require a minimum
  circleci CLI version, would be to define the parameters in their own
  file and concatenate the level-specific definition file with the
  config-2_1.yml file prior to processing.

  Either approach (yaml file override or concatenation) provides benefit
  since there would be no need to modify the level-specific file as the
  main configuration is changed, unlike with patch files.

  The changes needed to implement this are relatively small, and are
  easy to test, since a change to use pipeline parameters (along with
  requisite changes to `generate.sh') should not result in any changes
  to the config.yml files.


Other benefits for future investigation
═══════════════════════════════════════

  Beyond simple replacement of the patch files, there are some
  additional capabilities in CircleCI configuration that may benefit the
  project long-term. One of the most interesting is the concept of
  matrix jobs. Matrix jobs allow you to parameterize a job over multiple
  values. In our case, this could potentially allow us to parameterize
  jobs over different version of the JDK. This would reduce the overall
  size of the configuration, because the current approach duplicates
  jobs for each Java version. It would also make it simpler to add/test
  new Java versions by making a relatively small number of changes to
  the matrix instead of creating a whole set of version-specific jobs.
  We would still need to create version-specific executors, but if we
  move things like parallelism and resource class out of the executor
  and into the job definition, we can parameterize in the workflow and
  reduce the number of executors to one per Java version. For example:

  ┌────
  │ jobs:
  │   j8_unit_tests:
  │     parameters:
  │       executor:
  │         type: executor
  │     executor: << parameters.executor >>
  │     parallelism: << pipeline.parameters.unit_test_parallelism >>
  │     resource_class: << pipeline.parameters.unit_test_resclass >>
  │     steps:
  │       - attach_workspace:
  │           at: /home/cassandra
  │       - create_junit_containers
  │       - log_environment
  │       - run_parallel_junit_tests
  │
  │ workflows:
  │   pre-commit_tests:
  │     jobs:
  │       - j8_unit_tests:
  │           requires:
  │             - build
  │           matrix:
  │             parameters:
  │               executor: [java8-executor, java11-executor, java17-executor]
  └────

-- 
+---------------------------------------------------------------+
| Derek Chen-Becker                                             |
| GPG Key available at https://keybase.io/dchenbecker and       |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---------------------------------------------------------------+

A proposal for refactoring the CircleCI config

Reply via email to