Yes. I was doing the work there and hopefully will get there soon. - Sijie
On Mon, Feb 1, 2021 at 12:40 PM Ali Ahmed <ahmal...@gmail.com> wrote: > I recommend we move the connectors away for the pulsar repo to reduce the > load on the main ci pipeline. The new repo seems ready. > https://github.com/apache/pulsar-connectors. > > -Ali > > On Fri, Jan 29, 2021 at 9:22 AM Sijie Guo <guosi...@gmail.com> wrote: > > > Currently, Github Actions are shared across one large `apache` > > organization. It is the main problem for GA-based CI besides flaky tests. > > > > If we use Azure Pipeline, we can have a dedicated project for the pulsar. > > So we will have more resources to run. > > It will solve the problem that this proposal tries to solve. The approach > > has been used by Flink. We have started some experiments. We will share > > some of results here next week. > > > > Thanks, > > Sijie > > > > On Fri, Jan 29, 2021 at 8:34 AM Lari Hotari <l...@hotari.net> wrote: > > > > > Hi Sijie, > > > > > > Let's keep this work going since resolving the problems with Pulsar CI > > are > > > urgent. > > > > > > I took a quick glance on the Azure Pipelines solution in Flink. By > > Googling > > > I found > > https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines > > > . > > > In the repository I found > > > https://github.com/apache/flink/blob/master/azure-pipelines.yml which > > > references > > > > > > > > > https://github.com/apache/flink/blob/master/tools/azure-pipelines/jobs-template.yml > > > > > > It uses the build matrix feature to parallelize the execution: > > > > > > > > > https://github.com/apache/flink/blob/dd0ee24e55dab4ae76201103c76495bc4fa0f73b/tools/azure-pipelines/jobs-template.yml#L88-L107 > > > > > > What would be the key benefit for Pulsar CI of using Azure Pipelines > over > > > GitHub Actions? > > > > > > -Lari > > > > > > On Fri, Jan 29, 2021 at 6:03 PM Sijie Guo <guosi...@gmail.com> wrote: > > > > > > > Lari, > > > > > > > > Yes, we can keep this proposal open for discussion. That's for sure. > > > > > > > > I just don't have any good solution at this moment with a > > > multiple-workflow > > > > approach using Github Actions. > > > > > > > > An alternative is to look into Azure Pipeline, which the Flink > > community > > > is > > > > using. > > > > We are still learning there. Will post thoughts here once we have a > > > better > > > > idea. > > > > > > > > Thanks, > > > > Sijie > > > > > > > > On Fri, Jan 29, 2021 at 5:07 AM Lari Hotari <l...@hotari.net> wrote: > > > > > > > > > Thanks for the feedback, Sijie. > > > > > > > > > > > If this proposal is blocked by the other proposal, we should > focus > > on > > > > > getting the changes for the other proposal before talking about > > merging > > > > > them. > > > > > > > > > > Yes, the current proposal depends on the draft PIP for "Changes to > > > flaky > > > > > test handling". I'll follow up on fixing the flaky test in a new > > email > > > > > thread. > > > > > > > > > > I hope we could get the discussions going on both draft PIPs and > find > > > > > consensus together as a community. > > > > > During the discussions, more solution options will come up. Each > > > solution > > > > > has trade offs. > > > > > It would be useful to document the options when the community > doesn't > > > > > immediately agree on a single choice. > > > > > I was thinking that these options could be documented in the same > > draft > > > > PIP > > > > > documents. > > > > > > > > > > I can give multiple authors editing access to the Google Docs so > that > > > we > > > > > can keep on editing a single document for both draft PIPs. > > > > > Anyone who would want to add more solution options to the > documents, > > > > please > > > > > let me know so that I'll add editing access. > > > > > > > > > > Sijie, would you like to document the option around keeping the > > > workflow > > > > as > > > > > multiple smaller workflows? > > > > > I have understood that the problems that have come up with the > Pulsar > > > CI > > > > > regarding resource consumption would have to be resolved in that > > > > > alternative as well. > > > > > > > > > > I believe that everyone is open to any set of solution alternatives > > > which > > > > > solves the problems that we have with Pulsar CI. > > > > > We all know that it's urgent to fix Pulsar CI asap. We can do it > > > > together. > > > > > > > > > > BR, Lari > > > > > > > > > > > > > > > On Fri, Jan 29, 2021 at 11:51 AM Sijie Guo <guosi...@gmail.com> > > wrote: > > > > > > > > > > > Lari, > > > > > > > > > > > > Thank you for bringing this proposal up! This is a great > > initiative. > > > > > > > > > > > > However, I agreed with Yong. We have spent tons of effort > splitting > > > one > > > > > > large workflow into multiple smaller workflows. > > > > > > > > > > > > If this proposal is blocked by the other proposal, we should > focus > > on > > > > > > getting the changes for the other proposal before talking about > > > merging > > > > > > them. > > > > > > > > > > > > Thanks, > > > > > > Sijie > > > > > > > > > > > > On Thu, Jan 28, 2021 at 9:55 PM Lari Hotari <l...@hotari.net> > > wrote: > > > > > > > > > > > > > Thank you for the comments Penghui. > > > > > > > > > > > > > > Exactly what you said, we should make the tests stable. > > > > > > > The proposals in the other draft PIP "Changes to flaky test > > > handling" > > > > > > deals > > > > > > > with that. > > > > > > > It's currently a draft and needs more eyes. Would you be able > to > > > > take a > > > > > > > closer look at that too? > > > > > > > > > > > > > > BR, Lari > > > > > > > > > > > > > > On Fri, Jan 29, 2021 at 6:41 AM PengHui Li < > > > codelipeng...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > Currently, especially for the integration tests, a lot time > to > > > > build > > > > > > > > pulsar distributions and docker images. > > > > > > > > I think before merge tests we should to make the test stable, > > > > > otherwise > > > > > > > > rerun the test will become more expensive. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Penghui > > > > > > > > On Jan 29, 2021, 11:55 AM +0800, Yong Zhang < > > > > > > zhangyong1025...@gmail.com > > > > > > > >, > > > > > > > > wrote: > > > > > > > > > I am not sure that merge all the workflows into one > workflow > > > is a > > > > > > good > > > > > > > > > idea. As > > > > > > > > > I know, Github Actions doesn't allow to rerun a single job > > in a > > > > > > > workflow. > > > > > > > > > That means > > > > > > > > > if there has any failure in the workflow, we need to rerun > > all > > > > > > > > > steps/stage. There has > > > > > > > > > a worst-case is we failed in the different tests when > > rerunning > > > > it > > > > > > and > > > > > > > > this > > > > > > > > > would take > > > > > > > > > more time to pass the CI. > > > > > > > > > > > > > > > > > > --- > > > > > > > > > Yong > > > > > > > > > > > > > > > > > > On Fri, 29 Jan 2021 at 01:14, Lari Hotari < > > > lari.hot...@sagire.fi > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Dear Pulsar community members, > > > > > > > > > > > > > > > > > > > > Currently, the Pulsar GitHub Actions workflows are > > consuming > > > > the > > > > > > > > majority > > > > > > > > > > of the shared pool of resources allocated for > > > > github.com/apache > > > > > > > > projects. > > > > > > > > > > Other Apache projects have been impacted and there is a > > > demand > > > > to > > > > > > > > improve > > > > > > > > > > the Pulsar CI > > > > > > > > > > < > > > > > https://github.com/apache/pulsar/pull/9159#issuecomment-766915396 > > > > > > > > > > > > > > > asap. > > > > > > > > > > > > > > > > > > > > In GitHub Actions Runners, the unit of resources is the > > time > > > > > that a > > > > > > > > Runner > > > > > > > > > > is occupied. I observed the workflow runs for handling a > > > single > > > > > > Pull > > > > > > > > > > Request (in my personal fork) and these were the running > > > > > durations: > > > > > > > > > > Workflow name Duration > > > > > > > > > > CI - Build - MacOS 0:17:23 > > > > > > > > > > CI - Go Functions style check 0:02:38 > > > > > > > > > > CI - Unit - Brokers - Other 0:15:40 > > > > > > > > > > CI - Unit - Brokers - Client Impl 0:16:28 > > > > > > > > > > CI - Misc 0:16:51 > > > > > > > > > > CI - Unit - Proxy 0:14:23 > > > > > > > > > > CI - Go Functions Tests 0:22:08 > > > > > > > > > > CI - CPP, Python Tests 0:23:30 > > > > > > > > > > CI - Unit 0:42:11 > > > > > > > > > > CI - Integration - Sql 1:00:13 > > > > > > > > > > CI - Integration - Tiered JCloud 1:00:18 > > > > > > > > > > CI - Integration - Tiered FileSystem 1:00:13 > > > > > > > > > > CI - Integration - Function State 1:00:12 > > > > > > > > > > CI - Integration - Cli 1:10:22 > > > > > > > > > > CI - Integration - Transaction 1:16:34 > > > > > > > > > > CI - Integration - Process 1:11:23 > > > > > > > > > > CI - Shade - Test 1:15:45 > > > > > > > > > > CI - Unit - Brokers - Client Api 0:26:13 > > > > > > > > > > CI - Unit - Brokers - Broker Group 2 0:35:05 > > > > > > > > > > CI - Integration - Standalone 0:45:29 > > > > > > > > > > CI - Integration - Messaging 1:00:23 > > > > > > > > > > CI - Integration - Thread 1:00:19 > > > > > > > > > > CI - Integration - Backwards Compatibility 1:00:19 > > > > > > > > > > CI - Integration - Schema 1:00:19 > > > > > > > > > > CI - Unit - Brokers - Broker Group 1 2:02:31 > > > > > > > > > > TOTAL 19:36:50 > > > > > > > > > > > > > > > > > > > > *In this case, the total resource consumption of GitHub > > > Actions > > > > > > > > Runners is > > > > > > > > > > 19 hours 36 minutes 50 seconds for a single pull request > to > > > > > > > > apache/pulsar.* > > > > > > > > > > > > > > > > > > > > Since GitHub Actions Runner resource pool utilization is > > very > > > > > high, > > > > > > > > this > > > > > > > > > > leads to the build queue to grow and take a long time to > > > > process. > > > > > > > > > > > > > > > > > > > > I have been looking for ways to improve the Pulsar CI for > > the > > > > > last > > > > > > 3 > > > > > > > > > > months. During this period I worked on a few experiments. > > The > > > > > > > learnings > > > > > > > > > > from the past experiments are documented at a high level > in > > > the > > > > > > > > following > > > > > > > > > > draft PIP document. > > > > > > > > > > > > > > > > > > > > *The draft PIP "Changes to GitHub Actions based Pulsar > CI" > > > > > document > > > > > > > is > > > > > > > > a > > > > > > > > > > Google doc:* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1FNEWD3COdnNGMiryO9qBUW_83qtzAhqjDI5wwmPD-YE/edit?usp=sharing > > > > > > > > > > > > > > > > > > > > *Please participate* so that we get the plan adjusted > based > > > on > > > > > the > > > > > > > > feedback > > > > > > > > > > asap. If there's already a similar effort ongoing, I hope > > we > > > > can > > > > > > join > > > > > > > > > > efforts. > > > > > > > > > > > > > > > > > > > > *Let's fix Pulsar CI!* > > > > > > > > > > > > > > > > > > > > BR, Lari > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > -Ali >