To be honest, this sounds exactly like the usual CI problem on every platform. As your project scales up, CI becomes a Hard Problem. I don’t think throwing hardware at it indefinitely works, though your research here is finding most of the useful things.
On Tue, Feb 9, 2021 at 02:21 Jarek Potiuk <ja...@potiuk.com> wrote: > The report shows only top contenders. And yes - we know it is flawed - > because it shows workflows not jobs (if you read the disclaimers - we > simply have not enough API calls quota to get detailed information for all > projects). > > So this is anecdotal. I also get no queue when I submit PR at 11 pm. > Actually whole Airflow committer team had to switch to the "night shift" > because of that. And the most "traffic-heavy" projects - Spark, Pulsar, > Superset, Beam, Airflow - I think some of the top "traffic" projects > experience the same issues and several hours queue when they run during the > EMEA day/US morning. And we all together try to help each other (for > example I helped yesterday the Pulsar team to implement most aggressive way > of cancelling their workflows https://github.com/apache/pulsar/pull/9503 > (you can find pretty good explanation why and how it was implemented this > way), also we are working together with the Pulsar team to optimize their > workflow - there is a document > > https://docs.google.com/document/d/1FNEWD3COdnNGMiryO9qBUW_83qtzAhqjDI5wwmPD-YE/edit > where several peopel are adding their suggestions (including myself based > on Airflow experiences). > > And with yetus' 12 (!) wokflow runs over the last 2 monhts (!) > https://pasteboard.co/JNwGLiR.png - indeed you have a high chance you have > not experienced it, especially that you are the only person committing > there. This is hardly representative for other projects that have 100s of > committers and 100s of PRs a day. I am not sure if you are aware of > that, but those are the most valuable projects for the ASF - as those are > the ones that actually build community (Folowing "comunity over code > motto). If you have 3 PRs in 3 months and there aare 200 other projects > using GA, I think yetus is not going to show up in any meaningful > statistics. > > I am not sure if drawing a conclusion from a project that has 3 PRs in 2 > months is the best way of drawing conclusions for the overall Apache > organisation. I think drawing a conclusion from experiences of 5 actually > active projects with sometimes even 100 PRs a day is probably better > justified (yep - there are such projects). > So I would probably agree it has little influence on projects that have no > traffic. But enormous influence on projects that actually have traffic. You > have several teams of people scrambling now to somehow manage their CI as > it is unbearable now. Is this serious ? I'd say so. > > When you see Airflow backed up, maybe you should try submitting a > PR to another project yourself to see what happens. > > I am already spending a TON of my private time trying to help others in the > community. I would really appreciate a little help from your side. So maybe > you just submit 2-3 PRs yourself any time Monday - Friday 12pm CET -> 8pm > CET - this is where regularly bottlenecks happen. Please let everyone know > your findings > > J, > > > On Tue, Feb 9, 2021 at 8:35 AM Allen Wittenauer > <a...@effectivemachines.com.invalid> wrote: > > > > > > > > On Feb 8, 2021, at 5:00 PM, Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > >> I'm not convinced this is true. I have yet to see any of my PRs for > > > "non-big" projects getting queued while Spark, Airflow, others are. > Thus > > > why I think there are only a handful of projects that are getting upset > > > about this but the rest of us are like "meh whatever." > > > > > > Do you have any data on that? Or is it just anecdotal evidence? > > > > Totally anecdotal. Like when I literally ran a Yetus PR during > > the builds meeting as you were complaining about Airflow having an X deep > > queue. My PR ran fine, no pause. > > > > > You can see some analysis and actually even charts here: > > > > https://cwiki.apache.org/confluence/display/BUILDS/GitHub+Actions+status > > > > Yes, and I don't even see Yetus showing up. I wonder how many > > other projects are getting dropped from the dataset.... > > > > > Maybe you have a very tiny "PR traffic" and it is mostly in the time > zone > > > that is not affected? > > > > True, it has very tiny PR traffic right now. (Sep/Oct/Nov was > > different though) But if it was one big FIFO queue, our PR jobs would > also > > get queued. They aren't even when I go look at one of the other projects > > that does have queued jobs. > > > > When you see Airflow backed up, maybe you should try submitting a > > PR to another project yourself to see what happens. > > > > All I'm saying is: right now, that document feels like it is > > _greatly_ overstating the problem and now that you point it out, clearly > > dropping data. It is problem, to be sure, but not all GitHub Actions > > projects are suffering. (I wouldn't be surprised if smaller projects are > > actually fast tracked through the build queue in order to avoid a tyranny > > of the majority/resource starvation problem... which would be ironic > given > > how much of an issue that is at the ASF.) > > > > -- > +48 660 796 129 >