Hi Sijie, Let's keep this work going since resolving the problems with Pulsar CI are urgent.
I took a quick glance on the Azure Pipelines solution in Flink. By Googling I found https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines . In the repository I found https://github.com/apache/flink/blob/master/azure-pipelines.yml which references https://github.com/apache/flink/blob/master/tools/azure-pipelines/jobs-template.yml It uses the build matrix feature to parallelize the execution: https://github.com/apache/flink/blob/dd0ee24e55dab4ae76201103c76495bc4fa0f73b/tools/azure-pipelines/jobs-template.yml#L88-L107 What would be the key benefit for Pulsar CI of using Azure Pipelines over GitHub Actions? -Lari On Fri, Jan 29, 2021 at 6:03 PM Sijie Guo <guosi...@gmail.com> wrote: > Lari, > > Yes, we can keep this proposal open for discussion. That's for sure. > > I just don't have any good solution at this moment with a multiple-workflow > approach using Github Actions. > > An alternative is to look into Azure Pipeline, which the Flink community is > using. > We are still learning there. Will post thoughts here once we have a better > idea. > > Thanks, > Sijie > > On Fri, Jan 29, 2021 at 5:07 AM Lari Hotari <l...@hotari.net> wrote: > > > Thanks for the feedback, Sijie. > > > > > If this proposal is blocked by the other proposal, we should focus on > > getting the changes for the other proposal before talking about merging > > them. > > > > Yes, the current proposal depends on the draft PIP for "Changes to flaky > > test handling". I'll follow up on fixing the flaky test in a new email > > thread. > > > > I hope we could get the discussions going on both draft PIPs and find > > consensus together as a community. > > During the discussions, more solution options will come up. Each solution > > has trade offs. > > It would be useful to document the options when the community doesn't > > immediately agree on a single choice. > > I was thinking that these options could be documented in the same draft > PIP > > documents. > > > > I can give multiple authors editing access to the Google Docs so that we > > can keep on editing a single document for both draft PIPs. > > Anyone who would want to add more solution options to the documents, > please > > let me know so that I'll add editing access. > > > > Sijie, would you like to document the option around keeping the workflow > as > > multiple smaller workflows? > > I have understood that the problems that have come up with the Pulsar CI > > regarding resource consumption would have to be resolved in that > > alternative as well. > > > > I believe that everyone is open to any set of solution alternatives which > > solves the problems that we have with Pulsar CI. > > We all know that it's urgent to fix Pulsar CI asap. We can do it > together. > > > > BR, Lari > > > > > > On Fri, Jan 29, 2021 at 11:51 AM Sijie Guo <guosi...@gmail.com> wrote: > > > > > Lari, > > > > > > Thank you for bringing this proposal up! This is a great initiative. > > > > > > However, I agreed with Yong. We have spent tons of effort splitting one > > > large workflow into multiple smaller workflows. > > > > > > If this proposal is blocked by the other proposal, we should focus on > > > getting the changes for the other proposal before talking about merging > > > them. > > > > > > Thanks, > > > Sijie > > > > > > On Thu, Jan 28, 2021 at 9:55 PM Lari Hotari <l...@hotari.net> wrote: > > > > > > > Thank you for the comments Penghui. > > > > > > > > Exactly what you said, we should make the tests stable. > > > > The proposals in the other draft PIP "Changes to flaky test handling" > > > deals > > > > with that. > > > > It's currently a draft and needs more eyes. Would you be able to > take a > > > > closer look at that too? > > > > > > > > BR, Lari > > > > > > > > On Fri, Jan 29, 2021 at 6:41 AM PengHui Li <codelipeng...@gmail.com> > > > > wrote: > > > > > > > > > Currently, especially for the integration tests, a lot time to > build > > > > > pulsar distributions and docker images. > > > > > I think before merge tests we should to make the test stable, > > otherwise > > > > > rerun the test will become more expensive. > > > > > > > > > > Thanks, > > > > > Penghui > > > > > On Jan 29, 2021, 11:55 AM +0800, Yong Zhang < > > > zhangyong1025...@gmail.com > > > > >, > > > > > wrote: > > > > > > I am not sure that merge all the workflows into one workflow is a > > > good > > > > > > idea. As > > > > > > I know, Github Actions doesn't allow to rerun a single job in a > > > > workflow. > > > > > > That means > > > > > > if there has any failure in the workflow, we need to rerun all > > > > > > steps/stage. There has > > > > > > a worst-case is we failed in the different tests when rerunning > it > > > and > > > > > this > > > > > > would take > > > > > > more time to pass the CI. > > > > > > > > > > > > --- > > > > > > Yong > > > > > > > > > > > > On Fri, 29 Jan 2021 at 01:14, Lari Hotari <lari.hot...@sagire.fi > > > > > > wrote: > > > > > > > > > > > > > Dear Pulsar community members, > > > > > > > > > > > > > > Currently, the Pulsar GitHub Actions workflows are consuming > the > > > > > majority > > > > > > > of the shared pool of resources allocated for > github.com/apache > > > > > projects. > > > > > > > Other Apache projects have been impacted and there is a demand > to > > > > > improve > > > > > > > the Pulsar CI > > > > > > > < > > https://github.com/apache/pulsar/pull/9159#issuecomment-766915396 > > > > > > > > > asap. > > > > > > > > > > > > > > In GitHub Actions Runners, the unit of resources is the time > > that a > > > > > Runner > > > > > > > is occupied. I observed the workflow runs for handling a single > > > Pull > > > > > > > Request (in my personal fork) and these were the running > > durations: > > > > > > > Workflow name Duration > > > > > > > CI - Build - MacOS 0:17:23 > > > > > > > CI - Go Functions style check 0:02:38 > > > > > > > CI - Unit - Brokers - Other 0:15:40 > > > > > > > CI - Unit - Brokers - Client Impl 0:16:28 > > > > > > > CI - Misc 0:16:51 > > > > > > > CI - Unit - Proxy 0:14:23 > > > > > > > CI - Go Functions Tests 0:22:08 > > > > > > > CI - CPP, Python Tests 0:23:30 > > > > > > > CI - Unit 0:42:11 > > > > > > > CI - Integration - Sql 1:00:13 > > > > > > > CI - Integration - Tiered JCloud 1:00:18 > > > > > > > CI - Integration - Tiered FileSystem 1:00:13 > > > > > > > CI - Integration - Function State 1:00:12 > > > > > > > CI - Integration - Cli 1:10:22 > > > > > > > CI - Integration - Transaction 1:16:34 > > > > > > > CI - Integration - Process 1:11:23 > > > > > > > CI - Shade - Test 1:15:45 > > > > > > > CI - Unit - Brokers - Client Api 0:26:13 > > > > > > > CI - Unit - Brokers - Broker Group 2 0:35:05 > > > > > > > CI - Integration - Standalone 0:45:29 > > > > > > > CI - Integration - Messaging 1:00:23 > > > > > > > CI - Integration - Thread 1:00:19 > > > > > > > CI - Integration - Backwards Compatibility 1:00:19 > > > > > > > CI - Integration - Schema 1:00:19 > > > > > > > CI - Unit - Brokers - Broker Group 1 2:02:31 > > > > > > > TOTAL 19:36:50 > > > > > > > > > > > > > > *In this case, the total resource consumption of GitHub Actions > > > > > Runners is > > > > > > > 19 hours 36 minutes 50 seconds for a single pull request to > > > > > apache/pulsar.* > > > > > > > > > > > > > > Since GitHub Actions Runner resource pool utilization is very > > high, > > > > > this > > > > > > > leads to the build queue to grow and take a long time to > process. > > > > > > > > > > > > > > I have been looking for ways to improve the Pulsar CI for the > > last > > > 3 > > > > > > > months. During this period I worked on a few experiments. The > > > > learnings > > > > > > > from the past experiments are documented at a high level in the > > > > > following > > > > > > > draft PIP document. > > > > > > > > > > > > > > *The draft PIP "Changes to GitHub Actions based Pulsar CI" > > document > > > > is > > > > > a > > > > > > > Google doc:* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1FNEWD3COdnNGMiryO9qBUW_83qtzAhqjDI5wwmPD-YE/edit?usp=sharing > > > > > > > > > > > > > > *Please participate* so that we get the plan adjusted based on > > the > > > > > feedback > > > > > > > asap. If there's already a similar effort ongoing, I hope we > can > > > join > > > > > > > efforts. > > > > > > > > > > > > > > *Let's fix Pulsar CI!* > > > > > > > > > > > > > > BR, Lari > > > > > > > > > > > > > > > > > > > > > >