I recommend we move the connectors away for the pulsar repo to reduce the
load on the main ci pipeline. The new repo seems ready.
https://github.com/apache/pulsar-connectors.

-Ali

On Fri, Jan 29, 2021 at 9:22 AM Sijie Guo <guosi...@gmail.com> wrote:

> Currently, Github Actions are shared across one large `apache`
> organization. It is the main problem for GA-based CI besides flaky tests.
>
> If we use Azure Pipeline, we can have a dedicated project for the pulsar.
> So we will have more resources to run.
> It will solve the problem that this proposal tries to solve. The approach
> has been used by Flink. We have started some experiments. We will share
> some of results here next week.
>
> Thanks,
> Sijie
>
> On Fri, Jan 29, 2021 at 8:34 AM Lari Hotari <l...@hotari.net> wrote:
>
> > Hi Sijie,
> >
> > Let's keep this work going since resolving the problems with Pulsar CI
> are
> > urgent.
> >
> > I took a quick glance on the Azure Pipelines solution in Flink. By
> Googling
> > I found
> https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines
> > .
> > In the repository I found
> > https://github.com/apache/flink/blob/master/azure-pipelines.yml which
> > references
> >
> >
> https://github.com/apache/flink/blob/master/tools/azure-pipelines/jobs-template.yml
> >
> > It uses the build matrix feature to parallelize the execution:
> >
> >
> https://github.com/apache/flink/blob/dd0ee24e55dab4ae76201103c76495bc4fa0f73b/tools/azure-pipelines/jobs-template.yml#L88-L107
> >
> > What would be the key benefit for Pulsar CI of using Azure Pipelines over
> > GitHub Actions?
> >
> > -Lari
> >
> > On Fri, Jan 29, 2021 at 6:03 PM Sijie Guo <guosi...@gmail.com> wrote:
> >
> > > Lari,
> > >
> > > Yes, we can keep this proposal open for discussion. That's for sure.
> > >
> > > I just don't have any good solution at this moment with a
> > multiple-workflow
> > > approach using Github Actions.
> > >
> > > An alternative is to look into Azure Pipeline, which the Flink
> community
> > is
> > > using.
> > > We are still learning there. Will post thoughts here once we have a
> > better
> > > idea.
> > >
> > > Thanks,
> > > Sijie
> > >
> > > On Fri, Jan 29, 2021 at 5:07 AM Lari Hotari <l...@hotari.net> wrote:
> > >
> > > > Thanks for the feedback, Sijie.
> > > >
> > > > > If this proposal is blocked by the other proposal, we should focus
> on
> > > > getting the changes for the other proposal before talking about
> merging
> > > > them.
> > > >
> > > > Yes, the current proposal depends on the draft PIP for "Changes to
> > flaky
> > > > test handling". I'll follow up on fixing the flaky test in a new
> email
> > > > thread.
> > > >
> > > > I hope we could get the discussions going on both draft PIPs and find
> > > > consensus together as a community.
> > > > During the discussions, more solution options will come up. Each
> > solution
> > > > has trade offs.
> > > > It would be useful to document the options when the community doesn't
> > > > immediately agree on a single choice.
> > > > I was thinking that these options could be documented in the same
> draft
> > > PIP
> > > > documents.
> > > >
> > > > I can give multiple authors editing access to the Google Docs so that
> > we
> > > > can keep on editing a single document for both draft PIPs.
> > > > Anyone who would want to add more solution options to the documents,
> > > please
> > > > let me know so that I'll add editing access.
> > > >
> > > > Sijie, would you like to document the option around keeping the
> > workflow
> > > as
> > > > multiple smaller workflows?
> > > > I have understood that the problems that have come up with the Pulsar
> > CI
> > > > regarding resource consumption would have to be resolved in that
> > > > alternative as well.
> > > >
> > > > I believe that everyone is open to any set of solution alternatives
> > which
> > > > solves the problems that we have with Pulsar CI.
> > > > We all know that it's urgent to fix Pulsar CI asap. We can do it
> > > together.
> > > >
> > > > BR, Lari
> > > >
> > > >
> > > > On Fri, Jan 29, 2021 at 11:51 AM Sijie Guo <guosi...@gmail.com>
> wrote:
> > > >
> > > > > Lari,
> > > > >
> > > > > Thank you for bringing this proposal up! This is a great
> initiative.
> > > > >
> > > > > However, I agreed with Yong. We have spent tons of effort splitting
> > one
> > > > > large workflow into multiple smaller workflows.
> > > > >
> > > > > If this proposal is blocked by the other proposal, we should focus
> on
> > > > > getting the changes for the other proposal before talking about
> > merging
> > > > > them.
> > > > >
> > > > > Thanks,
> > > > > Sijie
> > > > >
> > > > > On Thu, Jan 28, 2021 at 9:55 PM Lari Hotari <l...@hotari.net>
> wrote:
> > > > >
> > > > > > Thank you for the comments Penghui.
> > > > > >
> > > > > > Exactly what you said, we should make the tests stable.
> > > > > > The proposals in the other draft PIP "Changes to flaky test
> > handling"
> > > > > deals
> > > > > > with that.
> > > > > > It's currently a draft and needs more eyes. Would you be able to
> > > take a
> > > > > > closer look at that too?
> > > > > >
> > > > > > BR, Lari
> > > > > >
> > > > > > On Fri, Jan 29, 2021 at 6:41 AM PengHui Li <
> > codelipeng...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Currently, especially for the integration tests, a lot time to
> > > build
> > > > > > > pulsar distributions and docker images.
> > > > > > > I think before merge tests we should to make the test stable,
> > > > otherwise
> > > > > > > rerun the test will become more expensive.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Penghui
> > > > > > > On Jan 29, 2021, 11:55 AM +0800, Yong Zhang <
> > > > > zhangyong1025...@gmail.com
> > > > > > >,
> > > > > > > wrote:
> > > > > > > > I am not sure that merge all the workflows into one workflow
> > is a
> > > > > good
> > > > > > > > idea. As
> > > > > > > > I know, Github Actions doesn't allow to rerun a single job
> in a
> > > > > > workflow.
> > > > > > > > That means
> > > > > > > > if there has any failure in the workflow, we need to rerun
> all
> > > > > > > > steps/stage. There has
> > > > > > > > a worst-case is we failed in the different tests when
> rerunning
> > > it
> > > > > and
> > > > > > > this
> > > > > > > > would take
> > > > > > > > more time to pass the CI.
> > > > > > > >
> > > > > > > > ---
> > > > > > > > Yong
> > > > > > > >
> > > > > > > > On Fri, 29 Jan 2021 at 01:14, Lari Hotari <
> > lari.hot...@sagire.fi
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Dear Pulsar community members,
> > > > > > > > >
> > > > > > > > > Currently, the Pulsar GitHub Actions workflows are
> consuming
> > > the
> > > > > > > majority
> > > > > > > > > of the shared pool of resources allocated for
> > > github.com/apache
> > > > > > > projects.
> > > > > > > > > Other Apache projects have been impacted and there is a
> > demand
> > > to
> > > > > > > improve
> > > > > > > > > the Pulsar CI
> > > > > > > > > <
> > > > https://github.com/apache/pulsar/pull/9159#issuecomment-766915396
> > > > > >
> > > > > > > asap.
> > > > > > > > >
> > > > > > > > > In GitHub Actions Runners, the unit of resources is the
> time
> > > > that a
> > > > > > > Runner
> > > > > > > > > is occupied. I observed the workflow runs for handling a
> > single
> > > > > Pull
> > > > > > > > > Request (in my personal fork) and these were the running
> > > > durations:
> > > > > > > > > Workflow name Duration
> > > > > > > > > CI - Build - MacOS 0:17:23
> > > > > > > > > CI - Go Functions style check 0:02:38
> > > > > > > > > CI - Unit - Brokers - Other 0:15:40
> > > > > > > > > CI - Unit - Brokers - Client Impl 0:16:28
> > > > > > > > > CI - Misc 0:16:51
> > > > > > > > > CI - Unit - Proxy 0:14:23
> > > > > > > > > CI - Go Functions Tests 0:22:08
> > > > > > > > > CI - CPP, Python Tests 0:23:30
> > > > > > > > > CI - Unit 0:42:11
> > > > > > > > > CI - Integration - Sql 1:00:13
> > > > > > > > > CI - Integration - Tiered JCloud 1:00:18
> > > > > > > > > CI - Integration - Tiered FileSystem 1:00:13
> > > > > > > > > CI - Integration - Function State 1:00:12
> > > > > > > > > CI - Integration - Cli 1:10:22
> > > > > > > > > CI - Integration - Transaction 1:16:34
> > > > > > > > > CI - Integration - Process 1:11:23
> > > > > > > > > CI - Shade - Test 1:15:45
> > > > > > > > > CI - Unit - Brokers - Client Api 0:26:13
> > > > > > > > > CI - Unit - Brokers - Broker Group 2 0:35:05
> > > > > > > > > CI - Integration - Standalone 0:45:29
> > > > > > > > > CI - Integration - Messaging 1:00:23
> > > > > > > > > CI - Integration - Thread 1:00:19
> > > > > > > > > CI - Integration - Backwards Compatibility 1:00:19
> > > > > > > > > CI - Integration - Schema 1:00:19
> > > > > > > > > CI - Unit - Brokers - Broker Group 1 2:02:31
> > > > > > > > > TOTAL 19:36:50
> > > > > > > > >
> > > > > > > > > *In this case, the total resource consumption of GitHub
> > Actions
> > > > > > > Runners is
> > > > > > > > > 19 hours 36 minutes 50 seconds for a single pull request to
> > > > > > > apache/pulsar.*
> > > > > > > > >
> > > > > > > > > Since GitHub Actions Runner resource pool utilization is
> very
> > > > high,
> > > > > > > this
> > > > > > > > > leads to the build queue to grow and take a long time to
> > > process.
> > > > > > > > >
> > > > > > > > > I have been looking for ways to improve the Pulsar CI for
> the
> > > > last
> > > > > 3
> > > > > > > > > months. During this period I worked on a few experiments.
> The
> > > > > > learnings
> > > > > > > > > from the past experiments are documented at a high level in
> > the
> > > > > > > following
> > > > > > > > > draft PIP document.
> > > > > > > > >
> > > > > > > > > *The draft PIP "Changes to GitHub Actions based Pulsar CI"
> > > > document
> > > > > > is
> > > > > > > a
> > > > > > > > > Google doc:*
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1FNEWD3COdnNGMiryO9qBUW_83qtzAhqjDI5wwmPD-YE/edit?usp=sharing
> > > > > > > > >
> > > > > > > > > *Please participate* so that we get the plan adjusted based
> > on
> > > > the
> > > > > > > feedback
> > > > > > > > > asap. If there's already a similar effort ongoing, I hope
> we
> > > can
> > > > > join
> > > > > > > > > efforts.
> > > > > > > > >
> > > > > > > > > *Let's fix Pulsar CI!*
> > > > > > > > >
> > > > > > > > > BR, Lari
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


-- 
-Ali

Reply via email to