Lari,

Yes, we can keep this proposal open for discussion. That's for sure.

I just don't have any good solution at this moment with a multiple-workflow
approach using Github Actions.

An alternative is to look into Azure Pipeline, which the Flink community is
using.
We are still learning there. Will post thoughts here once we have a better
idea.

Thanks,
Sijie

On Fri, Jan 29, 2021 at 5:07 AM Lari Hotari <l...@hotari.net> wrote:

> Thanks for the feedback, Sijie.
>
> > If this proposal is blocked by the other proposal, we should focus on
> getting the changes for the other proposal before talking about merging
> them.
>
> Yes, the current proposal depends on the draft PIP for "Changes to flaky
> test handling". I'll follow up on fixing the flaky test in a new email
> thread.
>
> I hope we could get the discussions going on both draft PIPs and find
> consensus together as a community.
> During the discussions, more solution options will come up. Each solution
> has trade offs.
> It would be useful to document the options when the community doesn't
> immediately agree on a single choice.
> I was thinking that these options could be documented in the same draft PIP
> documents.
>
> I can give multiple authors editing access to the Google Docs so that we
> can keep on editing a single document for both draft PIPs.
> Anyone who would want to add more solution options to the documents, please
> let me know so that I'll add editing access.
>
> Sijie, would you like to document the option around keeping the workflow as
> multiple smaller workflows?
> I have understood that the problems that have come up with the Pulsar CI
> regarding resource consumption would have to be resolved in that
> alternative as well.
>
> I believe that everyone is open to any set of solution alternatives which
> solves the problems that we have with Pulsar CI.
> We all know that it's urgent to fix Pulsar CI asap. We can do it together.
>
> BR, Lari
>
>
> On Fri, Jan 29, 2021 at 11:51 AM Sijie Guo <guosi...@gmail.com> wrote:
>
> > Lari,
> >
> > Thank you for bringing this proposal up! This is a great initiative.
> >
> > However, I agreed with Yong. We have spent tons of effort splitting one
> > large workflow into multiple smaller workflows.
> >
> > If this proposal is blocked by the other proposal, we should focus on
> > getting the changes for the other proposal before talking about merging
> > them.
> >
> > Thanks,
> > Sijie
> >
> > On Thu, Jan 28, 2021 at 9:55 PM Lari Hotari <l...@hotari.net> wrote:
> >
> > > Thank you for the comments Penghui.
> > >
> > > Exactly what you said, we should make the tests stable.
> > > The proposals in the other draft PIP "Changes to flaky test handling"
> > deals
> > > with that.
> > > It's currently a draft and needs more eyes. Would you be able to take a
> > > closer look at that too?
> > >
> > > BR, Lari
> > >
> > > On Fri, Jan 29, 2021 at 6:41 AM PengHui Li <codelipeng...@gmail.com>
> > > wrote:
> > >
> > > > Currently, especially for the integration tests, a lot time to build
> > > > pulsar distributions and docker images.
> > > > I think before merge tests we should to make the test stable,
> otherwise
> > > > rerun the test will become more expensive.
> > > >
> > > > Thanks,
> > > > Penghui
> > > > On Jan 29, 2021, 11:55 AM +0800, Yong Zhang <
> > zhangyong1025...@gmail.com
> > > >,
> > > > wrote:
> > > > > I am not sure that merge all the workflows into one workflow is a
> > good
> > > > > idea. As
> > > > > I know, Github Actions doesn't allow to rerun a single job in a
> > > workflow.
> > > > > That means
> > > > > if there has any failure in the workflow, we need to rerun all
> > > > > steps/stage. There has
> > > > > a worst-case is we failed in the different tests when rerunning it
> > and
> > > > this
> > > > > would take
> > > > > more time to pass the CI.
> > > > >
> > > > > ---
> > > > > Yong
> > > > >
> > > > > On Fri, 29 Jan 2021 at 01:14, Lari Hotari <lari.hot...@sagire.fi>
> > > wrote:
> > > > >
> > > > > > Dear Pulsar community members,
> > > > > >
> > > > > > Currently, the Pulsar GitHub Actions workflows are consuming the
> > > > majority
> > > > > > of the shared pool of resources allocated for github.com/apache
> > > > projects.
> > > > > > Other Apache projects have been impacted and there is a demand to
> > > > improve
> > > > > > the Pulsar CI
> > > > > > <
> https://github.com/apache/pulsar/pull/9159#issuecomment-766915396
> > >
> > > > asap.
> > > > > >
> > > > > > In GitHub Actions Runners, the unit of resources is the time
> that a
> > > > Runner
> > > > > > is occupied. I observed the workflow runs for handling a single
> > Pull
> > > > > > Request (in my personal fork) and these were the running
> durations:
> > > > > > Workflow name Duration
> > > > > > CI - Build - MacOS 0:17:23
> > > > > > CI - Go Functions style check 0:02:38
> > > > > > CI - Unit - Brokers - Other 0:15:40
> > > > > > CI - Unit - Brokers - Client Impl 0:16:28
> > > > > > CI - Misc 0:16:51
> > > > > > CI - Unit - Proxy 0:14:23
> > > > > > CI - Go Functions Tests 0:22:08
> > > > > > CI - CPP, Python Tests 0:23:30
> > > > > > CI - Unit 0:42:11
> > > > > > CI - Integration - Sql 1:00:13
> > > > > > CI - Integration - Tiered JCloud 1:00:18
> > > > > > CI - Integration - Tiered FileSystem 1:00:13
> > > > > > CI - Integration - Function State 1:00:12
> > > > > > CI - Integration - Cli 1:10:22
> > > > > > CI - Integration - Transaction 1:16:34
> > > > > > CI - Integration - Process 1:11:23
> > > > > > CI - Shade - Test 1:15:45
> > > > > > CI - Unit - Brokers - Client Api 0:26:13
> > > > > > CI - Unit - Brokers - Broker Group 2 0:35:05
> > > > > > CI - Integration - Standalone 0:45:29
> > > > > > CI - Integration - Messaging 1:00:23
> > > > > > CI - Integration - Thread 1:00:19
> > > > > > CI - Integration - Backwards Compatibility 1:00:19
> > > > > > CI - Integration - Schema 1:00:19
> > > > > > CI - Unit - Brokers - Broker Group 1 2:02:31
> > > > > > TOTAL 19:36:50
> > > > > >
> > > > > > *In this case, the total resource consumption of GitHub Actions
> > > > Runners is
> > > > > > 19 hours 36 minutes 50 seconds for a single pull request to
> > > > apache/pulsar.*
> > > > > >
> > > > > > Since GitHub Actions Runner resource pool utilization is very
> high,
> > > > this
> > > > > > leads to the build queue to grow and take a long time to process.
> > > > > >
> > > > > > I have been looking for ways to improve the Pulsar CI for the
> last
> > 3
> > > > > > months. During this period I worked on a few experiments. The
> > > learnings
> > > > > > from the past experiments are documented at a high level in the
> > > > following
> > > > > > draft PIP document.
> > > > > >
> > > > > > *The draft PIP "Changes to GitHub Actions based Pulsar CI"
> document
> > > is
> > > > a
> > > > > > Google doc:*
> > > > > >
> > > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1FNEWD3COdnNGMiryO9qBUW_83qtzAhqjDI5wwmPD-YE/edit?usp=sharing
> > > > > >
> > > > > > *Please participate* so that we get the plan adjusted based on
> the
> > > > feedback
> > > > > > asap. If there's already a similar effort ongoing, I hope we can
> > join
> > > > > > efforts.
> > > > > >
> > > > > > *Let's fix Pulsar CI!*
> > > > > >
> > > > > > BR, Lari
> > > > > >
> > > >
> > >
> >
>

Reply via email to