Dear Pulsar community members,

The work on "Changes to GitHub Actions based Pulsar CI" has gone forward
based on your feedback. Here are some updates about the work.

The draft PIP proposal document is here:
https://docs.google.com/document/d/1FNEWD3COdnNGMiryO9qBUW_83qtzAhqjDI5wwmPD-YE/edit#heading=h.f53rkcu20sry
There's a *detailed status update in the document about a prototype for the
refactored Pulsar CI GitHub Actions based workflow*.

Thanks for all the suggestions and feedback by now. A lot of improvements
have been made by the Pulsar contributors to overcome the technical
obstacles.
Special thanks go to Matteo for reducing the sizes of docker images. A lot
of small improvements have been made to the Pulsar maven build to enable
the new refactored GitHub Actions workflow. Thank you for all PR reviews
and feedback.

The main goal of the "Changes to GitHub Actions based Pulsar CI" work has
been to *reduce the resource consumption of the Pulsar CI build and to
speed up Pulsar development by improving the developer productivity* when
less time is wasted in waiting for Pulsar CI build feedback. The prototype
demonstrates these improvements.

As you can see from the email from Jan 28 below, *the resource consumption
was 19 hrs 36 minutes* for a single pull request that was observed when the
work began.
Now, with the prototype of the refactored Pulsar CI build, the resource
consumption is *7 hrs 9 minutes.*
*This is about 60% reduction in resource consumption.* The whole pipeline
completes in 75-100 minutes.

Here's a breakdown of the duration (resource consumption) of each build job
in the refactored workflow:
Workflow Job seconds h:mm:ss
Pulsar CI Changed files check 4 0:00:04
Pulsar CI Go 1.11 Functions 155 0:02:35
Pulsar CI Go 1.12 Functions 166 0:02:46
Pulsar CI Go 1.13 Functions 113 0:01:53
Pulsar CI Go 1.14 Functions 96 0:01:36
Pulsar CI Build on MacOS 1017 0:16:57
Pulsar CI Build and License check 346 0:05:46
Pulsar CI Build Pulsar CPP and Python clients 683 0:11:23
Pulsar CI Build Pulsar java-test-image docker image 405 0:06:45
Pulsar CI CI - Unit - Other 1580 0:26:20
Pulsar CI CI - Unit - Brokers - Broker Group 1 968 0:16:08
Pulsar CI CI - Unit - Brokers - Broker Group 2 2223 0:37:03
Pulsar CI CI - Unit - Brokers - Client Api 1652 0:27:32
Pulsar CI CI - Unit - Brokers - Client Impl 916 0:15:16
Pulsar CI CI - Unit - Brokers - Other 522 0:08:42
Pulsar CI CI - Unit - Proxy 331 0:05:31
Pulsar CI Build Pulsar docker image 2343 0:39:03
Pulsar CI CI - Integration - Shade 414 0:06:54
Pulsar CI CI - Integration - Backwards Compatibility 849 0:14:09
Pulsar CI CI - Integration - Cli 1490 0:24:50
Pulsar CI CI - Integration - Messaging 857 0:14:17
Pulsar CI CI - Integration - Schema 468 0:07:48
Pulsar CI CI - Integration - Standalone 286 0:04:46
Pulsar CI CI - Integration - Transaction 362 0:06:02
Pulsar CI CI - System - Function State 699 0:11:39
Pulsar CI CI - System - Tiered FileSystem 779 0:12:59
Pulsar CI CI - System - Tiered JCloud 529 0:08:49
Pulsar CI CI - System - Pulsar Connectors - Thread 1795 0:29:55
Pulsar CI CI - System - Pulsar Connectors - Process 2312 0:38:32
Pulsar CI CI - System - Sql 1377 0:22:57
*Total resource consumption*
7:08:57


GitHub Actions doesn't support restarting a single job (
https://github.community/t/ability-to-rerun-just-a-single-job-in-a-workflow/17234
).
However, this is not a showstopper since there are ways to address the
issues that cause flakiness.
There is a separate PIP for changing the way to handle flaky tests. You can
find the link to that in the "Changes to GitHub Actions based Pulsar CI"
document's header.

*Some requests for the Pulsar community:*

1) *Please take a look at the updated PIP document*:
https://docs.google.com/document/d/1FNEWD3COdnNGMiryO9qBUW_83qtzAhqjDI5wwmPD-YE/edit#heading=h.f53rkcu20sry
. *It also contains more details of the prototype that has been
successfully completed.*

2) *Please share your feedback and suggest a way forward.*

*Thank you for your help!*

BR, Lari

On Thu, Jan 28, 2021 at 7:13 PM Lari Hotari <lari.hot...@sagire.fi> wrote:

> Dear Pulsar community members,
>
> Currently, the Pulsar GitHub Actions workflows are consuming the majority
> of the shared pool of resources allocated for github.com/apache projects.
> Other Apache projects have been impacted and there is a demand to improve
> the Pulsar CI
> <https://github.com/apache/pulsar/pull/9159#issuecomment-766915396> asap.
>
> In GitHub Actions Runners, the unit of resources is the time that a Runner
> is occupied. I observed the workflow runs for handling a single Pull
> Request (in my personal fork) and these were the running durations:
> Workflow name Duration
> CI - Build - MacOS 0:17:23
> CI - Go Functions style check 0:02:38
> CI - Unit - Brokers - Other 0:15:40
> CI - Unit - Brokers - Client Impl 0:16:28
> CI - Misc 0:16:51
> CI - Unit - Proxy 0:14:23
> CI - Go Functions Tests 0:22:08
> CI - CPP, Python Tests 0:23:30
> CI - Unit 0:42:11
> CI - Integration - Sql 1:00:13
> CI - Integration - Tiered JCloud 1:00:18
> CI - Integration - Tiered FileSystem 1:00:13
> CI - Integration - Function State 1:00:12
> CI - Integration - Cli 1:10:22
> CI - Integration - Transaction 1:16:34
> CI - Integration - Process 1:11:23
> CI - Shade - Test 1:15:45
> CI - Unit - Brokers - Client Api 0:26:13
> CI - Unit - Brokers - Broker Group 2 0:35:05
> CI - Integration - Standalone 0:45:29
> CI - Integration - Messaging 1:00:23
> CI - Integration - Thread 1:00:19
> CI - Integration - Backwards Compatibility 1:00:19
> CI - Integration - Schema 1:00:19
> CI - Unit - Brokers - Broker Group 1 2:02:31
> TOTAL 19:36:50
>
> *In this case, the total resource consumption of GitHub Actions Runners is
> 19 hours 36 minutes 50 seconds for a single pull request to apache/pulsar.*
>
> Since GitHub Actions Runner resource pool utilization is very high, this
> leads to the build queue to grow and take a long time to process.
>
> I have been looking for ways to improve the Pulsar CI for the last 3
> months. During this period I worked on a few experiments. The learnings
> from the past experiments are documented at a high level in the following
> draft PIP document.
>
> *The draft PIP "Changes to GitHub Actions based Pulsar CI" document is a
> Google doc:*
>
> https://docs.google.com/document/d/1FNEWD3COdnNGMiryO9qBUW_83qtzAhqjDI5wwmPD-YE/edit?usp=sharing
>
> *Please participate* so that we get the plan adjusted based on the
> feedback asap. If there's already a similar effort ongoing, I hope we can
> join efforts.
>
> *Let's fix Pulsar CI!*
>
> BR, Lari
>

Reply via email to