Yes. I was doing the work there and hopefully will get there soon.

- Sijie

On Mon, Feb 1, 2021 at 12:40 PM Ali Ahmed <ahmal...@gmail.com> wrote:

> I recommend we move the connectors away for the pulsar repo to reduce the
> load on the main ci pipeline. The new repo seems ready.
> https://github.com/apache/pulsar-connectors.
>
> -Ali
>
> On Fri, Jan 29, 2021 at 9:22 AM Sijie Guo <guosi...@gmail.com> wrote:
>
> > Currently, Github Actions are shared across one large `apache`
> > organization. It is the main problem for GA-based CI besides flaky tests.
> >
> > If we use Azure Pipeline, we can have a dedicated project for the pulsar.
> > So we will have more resources to run.
> > It will solve the problem that this proposal tries to solve. The approach
> > has been used by Flink. We have started some experiments. We will share
> > some of results here next week.
> >
> > Thanks,
> > Sijie
> >
> > On Fri, Jan 29, 2021 at 8:34 AM Lari Hotari <l...@hotari.net> wrote:
> >
> > > Hi Sijie,
> > >
> > > Let's keep this work going since resolving the problems with Pulsar CI
> > are
> > > urgent.
> > >
> > > I took a quick glance on the Azure Pipelines solution in Flink. By
> > Googling
> > > I found
> > https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines
> > > .
> > > In the repository I found
> > > https://github.com/apache/flink/blob/master/azure-pipelines.yml which
> > > references
> > >
> > >
> >
> https://github.com/apache/flink/blob/master/tools/azure-pipelines/jobs-template.yml
> > >
> > > It uses the build matrix feature to parallelize the execution:
> > >
> > >
> >
> https://github.com/apache/flink/blob/dd0ee24e55dab4ae76201103c76495bc4fa0f73b/tools/azure-pipelines/jobs-template.yml#L88-L107
> > >
> > > What would be the key benefit for Pulsar CI of using Azure Pipelines
> over
> > > GitHub Actions?
> > >
> > > -Lari
> > >
> > > On Fri, Jan 29, 2021 at 6:03 PM Sijie Guo <guosi...@gmail.com> wrote:
> > >
> > > > Lari,
> > > >
> > > > Yes, we can keep this proposal open for discussion. That's for sure.
> > > >
> > > > I just don't have any good solution at this moment with a
> > > multiple-workflow
> > > > approach using Github Actions.
> > > >
> > > > An alternative is to look into Azure Pipeline, which the Flink
> > community
> > > is
> > > > using.
> > > > We are still learning there. Will post thoughts here once we have a
> > > better
> > > > idea.
> > > >
> > > > Thanks,
> > > > Sijie
> > > >
> > > > On Fri, Jan 29, 2021 at 5:07 AM Lari Hotari <l...@hotari.net> wrote:
> > > >
> > > > > Thanks for the feedback, Sijie.
> > > > >
> > > > > > If this proposal is blocked by the other proposal, we should
> focus
> > on
> > > > > getting the changes for the other proposal before talking about
> > merging
> > > > > them.
> > > > >
> > > > > Yes, the current proposal depends on the draft PIP for "Changes to
> > > flaky
> > > > > test handling". I'll follow up on fixing the flaky test in a new
> > email
> > > > > thread.
> > > > >
> > > > > I hope we could get the discussions going on both draft PIPs and
> find
> > > > > consensus together as a community.
> > > > > During the discussions, more solution options will come up. Each
> > > solution
> > > > > has trade offs.
> > > > > It would be useful to document the options when the community
> doesn't
> > > > > immediately agree on a single choice.
> > > > > I was thinking that these options could be documented in the same
> > draft
> > > > PIP
> > > > > documents.
> > > > >
> > > > > I can give multiple authors editing access to the Google Docs so
> that
> > > we
> > > > > can keep on editing a single document for both draft PIPs.
> > > > > Anyone who would want to add more solution options to the
> documents,
> > > > please
> > > > > let me know so that I'll add editing access.
> > > > >
> > > > > Sijie, would you like to document the option around keeping the
> > > workflow
> > > > as
> > > > > multiple smaller workflows?
> > > > > I have understood that the problems that have come up with the
> Pulsar
> > > CI
> > > > > regarding resource consumption would have to be resolved in that
> > > > > alternative as well.
> > > > >
> > > > > I believe that everyone is open to any set of solution alternatives
> > > which
> > > > > solves the problems that we have with Pulsar CI.
> > > > > We all know that it's urgent to fix Pulsar CI asap. We can do it
> > > > together.
> > > > >
> > > > > BR, Lari
> > > > >
> > > > >
> > > > > On Fri, Jan 29, 2021 at 11:51 AM Sijie Guo <guosi...@gmail.com>
> > wrote:
> > > > >
> > > > > > Lari,
> > > > > >
> > > > > > Thank you for bringing this proposal up! This is a great
> > initiative.
> > > > > >
> > > > > > However, I agreed with Yong. We have spent tons of effort
> splitting
> > > one
> > > > > > large workflow into multiple smaller workflows.
> > > > > >
> > > > > > If this proposal is blocked by the other proposal, we should
> focus
> > on
> > > > > > getting the changes for the other proposal before talking about
> > > merging
> > > > > > them.
> > > > > >
> > > > > > Thanks,
> > > > > > Sijie
> > > > > >
> > > > > > On Thu, Jan 28, 2021 at 9:55 PM Lari Hotari <l...@hotari.net>
> > wrote:
> > > > > >
> > > > > > > Thank you for the comments Penghui.
> > > > > > >
> > > > > > > Exactly what you said, we should make the tests stable.
> > > > > > > The proposals in the other draft PIP "Changes to flaky test
> > > handling"
> > > > > > deals
> > > > > > > with that.
> > > > > > > It's currently a draft and needs more eyes. Would you be able
> to
> > > > take a
> > > > > > > closer look at that too?
> > > > > > >
> > > > > > > BR, Lari
> > > > > > >
> > > > > > > On Fri, Jan 29, 2021 at 6:41 AM PengHui Li <
> > > codelipeng...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Currently, especially for the integration tests, a lot time
> to
> > > > build
> > > > > > > > pulsar distributions and docker images.
> > > > > > > > I think before merge tests we should to make the test stable,
> > > > > otherwise
> > > > > > > > rerun the test will become more expensive.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Penghui
> > > > > > > > On Jan 29, 2021, 11:55 AM +0800, Yong Zhang <
> > > > > > zhangyong1025...@gmail.com
> > > > > > > >,
> > > > > > > > wrote:
> > > > > > > > > I am not sure that merge all the workflows into one
> workflow
> > > is a
> > > > > > good
> > > > > > > > > idea. As
> > > > > > > > > I know, Github Actions doesn't allow to rerun a single job
> > in a
> > > > > > > workflow.
> > > > > > > > > That means
> > > > > > > > > if there has any failure in the workflow, we need to rerun
> > all
> > > > > > > > > steps/stage. There has
> > > > > > > > > a worst-case is we failed in the different tests when
> > rerunning
> > > > it
> > > > > > and
> > > > > > > > this
> > > > > > > > > would take
> > > > > > > > > more time to pass the CI.
> > > > > > > > >
> > > > > > > > > ---
> > > > > > > > > Yong
> > > > > > > > >
> > > > > > > > > On Fri, 29 Jan 2021 at 01:14, Lari Hotari <
> > > lari.hot...@sagire.fi
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Dear Pulsar community members,
> > > > > > > > > >
> > > > > > > > > > Currently, the Pulsar GitHub Actions workflows are
> > consuming
> > > > the
> > > > > > > > majority
> > > > > > > > > > of the shared pool of resources allocated for
> > > > github.com/apache
> > > > > > > > projects.
> > > > > > > > > > Other Apache projects have been impacted and there is a
> > > demand
> > > > to
> > > > > > > > improve
> > > > > > > > > > the Pulsar CI
> > > > > > > > > > <
> > > > > https://github.com/apache/pulsar/pull/9159#issuecomment-766915396
> > > > > > >
> > > > > > > > asap.
> > > > > > > > > >
> > > > > > > > > > In GitHub Actions Runners, the unit of resources is the
> > time
> > > > > that a
> > > > > > > > Runner
> > > > > > > > > > is occupied. I observed the workflow runs for handling a
> > > single
> > > > > > Pull
> > > > > > > > > > Request (in my personal fork) and these were the running
> > > > > durations:
> > > > > > > > > > Workflow name Duration
> > > > > > > > > > CI - Build - MacOS 0:17:23
> > > > > > > > > > CI - Go Functions style check 0:02:38
> > > > > > > > > > CI - Unit - Brokers - Other 0:15:40
> > > > > > > > > > CI - Unit - Brokers - Client Impl 0:16:28
> > > > > > > > > > CI - Misc 0:16:51
> > > > > > > > > > CI - Unit - Proxy 0:14:23
> > > > > > > > > > CI - Go Functions Tests 0:22:08
> > > > > > > > > > CI - CPP, Python Tests 0:23:30
> > > > > > > > > > CI - Unit 0:42:11
> > > > > > > > > > CI - Integration - Sql 1:00:13
> > > > > > > > > > CI - Integration - Tiered JCloud 1:00:18
> > > > > > > > > > CI - Integration - Tiered FileSystem 1:00:13
> > > > > > > > > > CI - Integration - Function State 1:00:12
> > > > > > > > > > CI - Integration - Cli 1:10:22
> > > > > > > > > > CI - Integration - Transaction 1:16:34
> > > > > > > > > > CI - Integration - Process 1:11:23
> > > > > > > > > > CI - Shade - Test 1:15:45
> > > > > > > > > > CI - Unit - Brokers - Client Api 0:26:13
> > > > > > > > > > CI - Unit - Brokers - Broker Group 2 0:35:05
> > > > > > > > > > CI - Integration - Standalone 0:45:29
> > > > > > > > > > CI - Integration - Messaging 1:00:23
> > > > > > > > > > CI - Integration - Thread 1:00:19
> > > > > > > > > > CI - Integration - Backwards Compatibility 1:00:19
> > > > > > > > > > CI - Integration - Schema 1:00:19
> > > > > > > > > > CI - Unit - Brokers - Broker Group 1 2:02:31
> > > > > > > > > > TOTAL 19:36:50
> > > > > > > > > >
> > > > > > > > > > *In this case, the total resource consumption of GitHub
> > > Actions
> > > > > > > > Runners is
> > > > > > > > > > 19 hours 36 minutes 50 seconds for a single pull request
> to
> > > > > > > > apache/pulsar.*
> > > > > > > > > >
> > > > > > > > > > Since GitHub Actions Runner resource pool utilization is
> > very
> > > > > high,
> > > > > > > > this
> > > > > > > > > > leads to the build queue to grow and take a long time to
> > > > process.
> > > > > > > > > >
> > > > > > > > > > I have been looking for ways to improve the Pulsar CI for
> > the
> > > > > last
> > > > > > 3
> > > > > > > > > > months. During this period I worked on a few experiments.
> > The
> > > > > > > learnings
> > > > > > > > > > from the past experiments are documented at a high level
> in
> > > the
> > > > > > > > following
> > > > > > > > > > draft PIP document.
> > > > > > > > > >
> > > > > > > > > > *The draft PIP "Changes to GitHub Actions based Pulsar
> CI"
> > > > > document
> > > > > > > is
> > > > > > > > a
> > > > > > > > > > Google doc:*
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1FNEWD3COdnNGMiryO9qBUW_83qtzAhqjDI5wwmPD-YE/edit?usp=sharing
> > > > > > > > > >
> > > > > > > > > > *Please participate* so that we get the plan adjusted
> based
> > > on
> > > > > the
> > > > > > > > feedback
> > > > > > > > > > asap. If there's already a similar effort ongoing, I hope
> > we
> > > > can
> > > > > > join
> > > > > > > > > > efforts.
> > > > > > > > > >
> > > > > > > > > > *Let's fix Pulsar CI!*
> > > > > > > > > >
> > > > > > > > > > BR, Lari
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> -Ali
>

Reply via email to