Thank you for the comments Penghui.

Exactly what you said, we should make the tests stable.
The proposals in the other draft PIP "Changes to flaky test handling" deals
with that.
It's currently a draft and needs more eyes. Would you be able to take a
closer look at that too?

BR, Lari

On Fri, Jan 29, 2021 at 6:41 AM PengHui Li <codelipeng...@gmail.com> wrote:

> Currently, especially for the integration tests, a lot time to build
> pulsar distributions and docker images.
> I think before merge tests we should to make the test stable, otherwise
> rerun the test will become more expensive.
>
> Thanks,
> Penghui
> On Jan 29, 2021, 11:55 AM +0800, Yong Zhang <zhangyong1025...@gmail.com>,
> wrote:
> > I am not sure that merge all the workflows into one workflow is a good
> > idea. As
> > I know, Github Actions doesn't allow to rerun a single job in a workflow.
> > That means
> > if there has any failure in the workflow, we need to rerun all
> > steps/stage. There has
> > a worst-case is we failed in the different tests when rerunning it and
> this
> > would take
> > more time to pass the CI.
> >
> > ---
> > Yong
> >
> > On Fri, 29 Jan 2021 at 01:14, Lari Hotari <lari.hot...@sagire.fi> wrote:
> >
> > > Dear Pulsar community members,
> > >
> > > Currently, the Pulsar GitHub Actions workflows are consuming the
> majority
> > > of the shared pool of resources allocated for github.com/apache
> projects.
> > > Other Apache projects have been impacted and there is a demand to
> improve
> > > the Pulsar CI
> > > <https://github.com/apache/pulsar/pull/9159#issuecomment-766915396>
> asap.
> > >
> > > In GitHub Actions Runners, the unit of resources is the time that a
> Runner
> > > is occupied. I observed the workflow runs for handling a single Pull
> > > Request (in my personal fork) and these were the running durations:
> > > Workflow name Duration
> > > CI - Build - MacOS 0:17:23
> > > CI - Go Functions style check 0:02:38
> > > CI - Unit - Brokers - Other 0:15:40
> > > CI - Unit - Brokers - Client Impl 0:16:28
> > > CI - Misc 0:16:51
> > > CI - Unit - Proxy 0:14:23
> > > CI - Go Functions Tests 0:22:08
> > > CI - CPP, Python Tests 0:23:30
> > > CI - Unit 0:42:11
> > > CI - Integration - Sql 1:00:13
> > > CI - Integration - Tiered JCloud 1:00:18
> > > CI - Integration - Tiered FileSystem 1:00:13
> > > CI - Integration - Function State 1:00:12
> > > CI - Integration - Cli 1:10:22
> > > CI - Integration - Transaction 1:16:34
> > > CI - Integration - Process 1:11:23
> > > CI - Shade - Test 1:15:45
> > > CI - Unit - Brokers - Client Api 0:26:13
> > > CI - Unit - Brokers - Broker Group 2 0:35:05
> > > CI - Integration - Standalone 0:45:29
> > > CI - Integration - Messaging 1:00:23
> > > CI - Integration - Thread 1:00:19
> > > CI - Integration - Backwards Compatibility 1:00:19
> > > CI - Integration - Schema 1:00:19
> > > CI - Unit - Brokers - Broker Group 1 2:02:31
> > > TOTAL 19:36:50
> > >
> > > *In this case, the total resource consumption of GitHub Actions
> Runners is
> > > 19 hours 36 minutes 50 seconds for a single pull request to
> apache/pulsar.*
> > >
> > > Since GitHub Actions Runner resource pool utilization is very high,
> this
> > > leads to the build queue to grow and take a long time to process.
> > >
> > > I have been looking for ways to improve the Pulsar CI for the last 3
> > > months. During this period I worked on a few experiments. The learnings
> > > from the past experiments are documented at a high level in the
> following
> > > draft PIP document.
> > >
> > > *The draft PIP "Changes to GitHub Actions based Pulsar CI" document is
> a
> > > Google doc:*
> > >
> > >
> https://docs.google.com/document/d/1FNEWD3COdnNGMiryO9qBUW_83qtzAhqjDI5wwmPD-YE/edit?usp=sharing
> > >
> > > *Please participate* so that we get the plan adjusted based on the
> feedback
> > > asap. If there's already a similar effort ongoing, I hope we can join
> > > efforts.
> > >
> > > *Let's fix Pulsar CI!*
> > >
> > > BR, Lari
> > >
>

Reply via email to