Hi Sijie,

Let's keep this work going since resolving the problems with Pulsar CI are
urgent.

I took a quick glance on the Azure Pipelines solution in Flink. By Googling
I found https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines .
In the repository I found
https://github.com/apache/flink/blob/master/azure-pipelines.yml which
references
https://github.com/apache/flink/blob/master/tools/azure-pipelines/jobs-template.yml

It uses the build matrix feature to parallelize the execution:
https://github.com/apache/flink/blob/dd0ee24e55dab4ae76201103c76495bc4fa0f73b/tools/azure-pipelines/jobs-template.yml#L88-L107

What would be the key benefit for Pulsar CI of using Azure Pipelines over
GitHub Actions?

-Lari

On Fri, Jan 29, 2021 at 6:03 PM Sijie Guo <guosi...@gmail.com> wrote:

> Lari,
>
> Yes, we can keep this proposal open for discussion. That's for sure.
>
> I just don't have any good solution at this moment with a multiple-workflow
> approach using Github Actions.
>
> An alternative is to look into Azure Pipeline, which the Flink community is
> using.
> We are still learning there. Will post thoughts here once we have a better
> idea.
>
> Thanks,
> Sijie
>
> On Fri, Jan 29, 2021 at 5:07 AM Lari Hotari <l...@hotari.net> wrote:
>
> > Thanks for the feedback, Sijie.
> >
> > > If this proposal is blocked by the other proposal, we should focus on
> > getting the changes for the other proposal before talking about merging
> > them.
> >
> > Yes, the current proposal depends on the draft PIP for "Changes to flaky
> > test handling". I'll follow up on fixing the flaky test in a new email
> > thread.
> >
> > I hope we could get the discussions going on both draft PIPs and find
> > consensus together as a community.
> > During the discussions, more solution options will come up. Each solution
> > has trade offs.
> > It would be useful to document the options when the community doesn't
> > immediately agree on a single choice.
> > I was thinking that these options could be documented in the same draft
> PIP
> > documents.
> >
> > I can give multiple authors editing access to the Google Docs so that we
> > can keep on editing a single document for both draft PIPs.
> > Anyone who would want to add more solution options to the documents,
> please
> > let me know so that I'll add editing access.
> >
> > Sijie, would you like to document the option around keeping the workflow
> as
> > multiple smaller workflows?
> > I have understood that the problems that have come up with the Pulsar CI
> > regarding resource consumption would have to be resolved in that
> > alternative as well.
> >
> > I believe that everyone is open to any set of solution alternatives which
> > solves the problems that we have with Pulsar CI.
> > We all know that it's urgent to fix Pulsar CI asap. We can do it
> together.
> >
> > BR, Lari
> >
> >
> > On Fri, Jan 29, 2021 at 11:51 AM Sijie Guo <guosi...@gmail.com> wrote:
> >
> > > Lari,
> > >
> > > Thank you for bringing this proposal up! This is a great initiative.
> > >
> > > However, I agreed with Yong. We have spent tons of effort splitting one
> > > large workflow into multiple smaller workflows.
> > >
> > > If this proposal is blocked by the other proposal, we should focus on
> > > getting the changes for the other proposal before talking about merging
> > > them.
> > >
> > > Thanks,
> > > Sijie
> > >
> > > On Thu, Jan 28, 2021 at 9:55 PM Lari Hotari <l...@hotari.net> wrote:
> > >
> > > > Thank you for the comments Penghui.
> > > >
> > > > Exactly what you said, we should make the tests stable.
> > > > The proposals in the other draft PIP "Changes to flaky test handling"
> > > deals
> > > > with that.
> > > > It's currently a draft and needs more eyes. Would you be able to
> take a
> > > > closer look at that too?
> > > >
> > > > BR, Lari
> > > >
> > > > On Fri, Jan 29, 2021 at 6:41 AM PengHui Li <codelipeng...@gmail.com>
> > > > wrote:
> > > >
> > > > > Currently, especially for the integration tests, a lot time to
> build
> > > > > pulsar distributions and docker images.
> > > > > I think before merge tests we should to make the test stable,
> > otherwise
> > > > > rerun the test will become more expensive.
> > > > >
> > > > > Thanks,
> > > > > Penghui
> > > > > On Jan 29, 2021, 11:55 AM +0800, Yong Zhang <
> > > zhangyong1025...@gmail.com
> > > > >,
> > > > > wrote:
> > > > > > I am not sure that merge all the workflows into one workflow is a
> > > good
> > > > > > idea. As
> > > > > > I know, Github Actions doesn't allow to rerun a single job in a
> > > > workflow.
> > > > > > That means
> > > > > > if there has any failure in the workflow, we need to rerun all
> > > > > > steps/stage. There has
> > > > > > a worst-case is we failed in the different tests when rerunning
> it
> > > and
> > > > > this
> > > > > > would take
> > > > > > more time to pass the CI.
> > > > > >
> > > > > > ---
> > > > > > Yong
> > > > > >
> > > > > > On Fri, 29 Jan 2021 at 01:14, Lari Hotari <lari.hot...@sagire.fi
> >
> > > > wrote:
> > > > > >
> > > > > > > Dear Pulsar community members,
> > > > > > >
> > > > > > > Currently, the Pulsar GitHub Actions workflows are consuming
> the
> > > > > majority
> > > > > > > of the shared pool of resources allocated for
> github.com/apache
> > > > > projects.
> > > > > > > Other Apache projects have been impacted and there is a demand
> to
> > > > > improve
> > > > > > > the Pulsar CI
> > > > > > > <
> > https://github.com/apache/pulsar/pull/9159#issuecomment-766915396
> > > >
> > > > > asap.
> > > > > > >
> > > > > > > In GitHub Actions Runners, the unit of resources is the time
> > that a
> > > > > Runner
> > > > > > > is occupied. I observed the workflow runs for handling a single
> > > Pull
> > > > > > > Request (in my personal fork) and these were the running
> > durations:
> > > > > > > Workflow name Duration
> > > > > > > CI - Build - MacOS 0:17:23
> > > > > > > CI - Go Functions style check 0:02:38
> > > > > > > CI - Unit - Brokers - Other 0:15:40
> > > > > > > CI - Unit - Brokers - Client Impl 0:16:28
> > > > > > > CI - Misc 0:16:51
> > > > > > > CI - Unit - Proxy 0:14:23
> > > > > > > CI - Go Functions Tests 0:22:08
> > > > > > > CI - CPP, Python Tests 0:23:30
> > > > > > > CI - Unit 0:42:11
> > > > > > > CI - Integration - Sql 1:00:13
> > > > > > > CI - Integration - Tiered JCloud 1:00:18
> > > > > > > CI - Integration - Tiered FileSystem 1:00:13
> > > > > > > CI - Integration - Function State 1:00:12
> > > > > > > CI - Integration - Cli 1:10:22
> > > > > > > CI - Integration - Transaction 1:16:34
> > > > > > > CI - Integration - Process 1:11:23
> > > > > > > CI - Shade - Test 1:15:45
> > > > > > > CI - Unit - Brokers - Client Api 0:26:13
> > > > > > > CI - Unit - Brokers - Broker Group 2 0:35:05
> > > > > > > CI - Integration - Standalone 0:45:29
> > > > > > > CI - Integration - Messaging 1:00:23
> > > > > > > CI - Integration - Thread 1:00:19
> > > > > > > CI - Integration - Backwards Compatibility 1:00:19
> > > > > > > CI - Integration - Schema 1:00:19
> > > > > > > CI - Unit - Brokers - Broker Group 1 2:02:31
> > > > > > > TOTAL 19:36:50
> > > > > > >
> > > > > > > *In this case, the total resource consumption of GitHub Actions
> > > > > Runners is
> > > > > > > 19 hours 36 minutes 50 seconds for a single pull request to
> > > > > apache/pulsar.*
> > > > > > >
> > > > > > > Since GitHub Actions Runner resource pool utilization is very
> > high,
> > > > > this
> > > > > > > leads to the build queue to grow and take a long time to
> process.
> > > > > > >
> > > > > > > I have been looking for ways to improve the Pulsar CI for the
> > last
> > > 3
> > > > > > > months. During this period I worked on a few experiments. The
> > > > learnings
> > > > > > > from the past experiments are documented at a high level in the
> > > > > following
> > > > > > > draft PIP document.
> > > > > > >
> > > > > > > *The draft PIP "Changes to GitHub Actions based Pulsar CI"
> > document
> > > > is
> > > > > a
> > > > > > > Google doc:*
> > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1FNEWD3COdnNGMiryO9qBUW_83qtzAhqjDI5wwmPD-YE/edit?usp=sharing
> > > > > > >
> > > > > > > *Please participate* so that we get the plan adjusted based on
> > the
> > > > > feedback
> > > > > > > asap. If there's already a similar effort ongoing, I hope we
> can
> > > join
> > > > > > > efforts.
> > > > > > >
> > > > > > > *Let's fix Pulsar CI!*
> > > > > > >
> > > > > > > BR, Lari
> > > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to