Lari, Thank you for this work. I have been following it very closely and I am happy that we are close to be able to leverage it very soon.
What is the next step in order to apply these changes to the Pulsar repo ? IIUC all of the changes that blocked you on building up this work have been merged (thanks also to Matteo) I imagine that we can come with a PR that switches current CI workflows to the new one. Enrico Il giorno ven 12 mar 2021 alle ore 07:06 Michael Marshall <mikemars...@gmail.com> ha scritto: > > This will be a great improvement. I read through the PIP, and overall, it > looks good to me. > > I left a question on the doc about how concurrent runs affect the > repository's 5 GB cache limit. > > I also think it could be helpful to explicitly document, or reference > github documentation, on how failure will affect the DAG. I'm assuming that > if an action fails, its parallel peer actions will run to completion, and > that the rest of the remaining stages will get canceled, but I haven't > worked with github actions before. > > Thanks for all of the work you've put in so far. > > On Thu, Mar 11, 2021 at 6:37 PM Yuva raj <uvar...@gmail.com> wrote: > > > This is great news. Thanks Hari , Mateo and pulsar community > > > > On Fri, Mar 12, 2021, 2:04 AM Lari Hotari <lari.hot...@sagire.fi> wrote: > > > > > Dear Pulsar community members, > > > > > > The work on "Changes to GitHub Actions based Pulsar CI" has gone forward > > > based on your feedback. Here are some updates about the work. > > > > > > The draft PIP proposal document is here: > > > > > > > > https://docs.google.com/document/d/1FNEWD3COdnNGMiryO9qBUW_83qtzAhqjDI5wwmPD-YE/edit#heading=h.f53rkcu20sry > > > There's a *detailed status update in the document about a prototype for > > the > > > refactored Pulsar CI GitHub Actions based workflow*. > > > > > > Thanks for all the suggestions and feedback by now. A lot of improvements > > > have been made by the Pulsar contributors to overcome the technical > > > obstacles. > > > Special thanks go to Matteo for reducing the sizes of docker images. A > > lot > > > of small improvements have been made to the Pulsar maven build to enable > > > the new refactored GitHub Actions workflow. Thank you for all PR reviews > > > and feedback. > > > > > > The main goal of the "Changes to GitHub Actions based Pulsar CI" work has > > > been to *reduce the resource consumption of the Pulsar CI build and to > > > speed up Pulsar development by improving the developer productivity* when > > > less time is wasted in waiting for Pulsar CI build feedback. The > > prototype > > > demonstrates these improvements. > > > > > > As you can see from the email from Jan 28 below, *the resource > > consumption > > > was 19 hrs 36 minutes* for a single pull request that was observed when > > the > > > work began. > > > Now, with the prototype of the refactored Pulsar CI build, the resource > > > consumption is *7 hrs 9 minutes.* > > > *This is about 60% reduction in resource consumption.* The whole pipeline > > > completes in 75-100 minutes. > > > > > > Here's a breakdown of the duration (resource consumption) of each build > > job > > > in the refactored workflow: > > > Workflow Job seconds h:mm:ss > > > Pulsar CI Changed files check 4 0:00:04 > > > Pulsar CI Go 1.11 Functions 155 0:02:35 > > > Pulsar CI Go 1.12 Functions 166 0:02:46 > > > Pulsar CI Go 1.13 Functions 113 0:01:53 > > > Pulsar CI Go 1.14 Functions 96 0:01:36 > > > Pulsar CI Build on MacOS 1017 0:16:57 > > > Pulsar CI Build and License check 346 0:05:46 > > > Pulsar CI Build Pulsar CPP and Python clients 683 0:11:23 > > > Pulsar CI Build Pulsar java-test-image docker image 405 0:06:45 > > > Pulsar CI CI - Unit - Other 1580 0:26:20 > > > Pulsar CI CI - Unit - Brokers - Broker Group 1 968 0:16:08 > > > Pulsar CI CI - Unit - Brokers - Broker Group 2 2223 0:37:03 > > > Pulsar CI CI - Unit - Brokers - Client Api 1652 0:27:32 > > > Pulsar CI CI - Unit - Brokers - Client Impl 916 0:15:16 > > > Pulsar CI CI - Unit - Brokers - Other 522 0:08:42 > > > Pulsar CI CI - Unit - Proxy 331 0:05:31 > > > Pulsar CI Build Pulsar docker image 2343 0:39:03 > > > Pulsar CI CI - Integration - Shade 414 0:06:54 > > > Pulsar CI CI - Integration - Backwards Compatibility 849 0:14:09 > > > Pulsar CI CI - Integration - Cli 1490 0:24:50 > > > Pulsar CI CI - Integration - Messaging 857 0:14:17 > > > Pulsar CI CI - Integration - Schema 468 0:07:48 > > > Pulsar CI CI - Integration - Standalone 286 0:04:46 > > > Pulsar CI CI - Integration - Transaction 362 0:06:02 > > > Pulsar CI CI - System - Function State 699 0:11:39 > > > Pulsar CI CI - System - Tiered FileSystem 779 0:12:59 > > > Pulsar CI CI - System - Tiered JCloud 529 0:08:49 > > > Pulsar CI CI - System - Pulsar Connectors - Thread 1795 0:29:55 > > > Pulsar CI CI - System - Pulsar Connectors - Process 2312 0:38:32 > > > Pulsar CI CI - System - Sql 1377 0:22:57 > > > *Total resource consumption* > > > 7:08:57 > > > > > > > > > GitHub Actions doesn't support restarting a single job ( > > > > > > > > https://github.community/t/ability-to-rerun-just-a-single-job-in-a-workflow/17234 > > > ). > > > However, this is not a showstopper since there are ways to address the > > > issues that cause flakiness. > > > There is a separate PIP for changing the way to handle flaky tests. You > > can > > > find the link to that in the "Changes to GitHub Actions based Pulsar CI" > > > document's header. > > > > > > *Some requests for the Pulsar community:* > > > > > > 1) *Please take a look at the updated PIP document*: > > > > > > > > https://docs.google.com/document/d/1FNEWD3COdnNGMiryO9qBUW_83qtzAhqjDI5wwmPD-YE/edit#heading=h.f53rkcu20sry > > > . *It also contains more details of the prototype that has been > > > successfully completed.* > > > > > > 2) *Please share your feedback and suggest a way forward.* > > > > > > *Thank you for your help!* > > > > > > BR, Lari > > > > > > On Thu, Jan 28, 2021 at 7:13 PM Lari Hotari <lari.hot...@sagire.fi> > > wrote: > > > > > > > Dear Pulsar community members, > > > > > > > > Currently, the Pulsar GitHub Actions workflows are consuming the > > majority > > > > of the shared pool of resources allocated for github.com/apache > > > projects. > > > > Other Apache projects have been impacted and there is a demand to > > improve > > > > the Pulsar CI > > > > <https://github.com/apache/pulsar/pull/9159#issuecomment-766915396> > > > asap. > > > > > > > > In GitHub Actions Runners, the unit of resources is the time that a > > > Runner > > > > is occupied. I observed the workflow runs for handling a single Pull > > > > Request (in my personal fork) and these were the running durations: > > > > Workflow name Duration > > > > CI - Build - MacOS 0:17:23 > > > > CI - Go Functions style check 0:02:38 > > > > CI - Unit - Brokers - Other 0:15:40 > > > > CI - Unit - Brokers - Client Impl 0:16:28 > > > > CI - Misc 0:16:51 > > > > CI - Unit - Proxy 0:14:23 > > > > CI - Go Functions Tests 0:22:08 > > > > CI - CPP, Python Tests 0:23:30 > > > > CI - Unit 0:42:11 > > > > CI - Integration - Sql 1:00:13 > > > > CI - Integration - Tiered JCloud 1:00:18 > > > > CI - Integration - Tiered FileSystem 1:00:13 > > > > CI - Integration - Function State 1:00:12 > > > > CI - Integration - Cli 1:10:22 > > > > CI - Integration - Transaction 1:16:34 > > > > CI - Integration - Process 1:11:23 > > > > CI - Shade - Test 1:15:45 > > > > CI - Unit - Brokers - Client Api 0:26:13 > > > > CI - Unit - Brokers - Broker Group 2 0:35:05 > > > > CI - Integration - Standalone 0:45:29 > > > > CI - Integration - Messaging 1:00:23 > > > > CI - Integration - Thread 1:00:19 > > > > CI - Integration - Backwards Compatibility 1:00:19 > > > > CI - Integration - Schema 1:00:19 > > > > CI - Unit - Brokers - Broker Group 1 2:02:31 > > > > TOTAL 19:36:50 > > > > > > > > *In this case, the total resource consumption of GitHub Actions Runners > > > is > > > > 19 hours 36 minutes 50 seconds for a single pull request to > > > apache/pulsar.* > > > > > > > > Since GitHub Actions Runner resource pool utilization is very high, > > this > > > > leads to the build queue to grow and take a long time to process. > > > > > > > > I have been looking for ways to improve the Pulsar CI for the last 3 > > > > months. During this period I worked on a few experiments. The learnings > > > > from the past experiments are documented at a high level in the > > following > > > > draft PIP document. > > > > > > > > *The draft PIP "Changes to GitHub Actions based Pulsar CI" document is > > a > > > > Google doc:* > > > > > > > > > > > > > https://docs.google.com/document/d/1FNEWD3COdnNGMiryO9qBUW_83qtzAhqjDI5wwmPD-YE/edit?usp=sharing > > > > > > > > *Please participate* so that we get the plan adjusted based on the > > > > feedback asap. If there's already a similar effort ongoing, I hope we > > can > > > > join efforts. > > > > > > > > *Let's fix Pulsar CI!* > > > > > > > > BR, Lari > > > > > > > > >