To be honest, this sounds exactly like the usual CI problem on every
platform. As your project scales up, CI becomes a Hard Problem. I don’t
think throwing hardware at it indefinitely works, though your research here
is finding most of the useful things.

On Tue, Feb 9, 2021 at 02:21 Jarek Potiuk <ja...@potiuk.com> wrote:

> The report shows only top contenders. And yes - we know it is flawed -
> because it shows workflows not jobs (if you read the disclaimers - we
> simply have not enough API calls quota to get detailed information for all
> projects).
>
> So this is anecdotal. I also get no queue when I submit PR at 11 pm.
> Actually whole Airflow committer team had to switch to the "night shift"
> because of that. And the most "traffic-heavy" projects - Spark, Pulsar,
> Superset, Beam, Airflow -  I think some of the top "traffic" projects
> experience the same issues and several hours queue when they run during the
> EMEA day/US morning.  And we all together try to help each other (for
> example I helped yesterday the Pulsar team to implement most aggressive way
> of cancelling their workflows https://github.com/apache/pulsar/pull/9503
> (you can find pretty good explanation why and how it was implemented this
> way), also we are working together with the Pulsar team to optimize their
> workflow - there is a document
>
> https://docs.google.com/document/d/1FNEWD3COdnNGMiryO9qBUW_83qtzAhqjDI5wwmPD-YE/edit
> where several peopel are adding their suggestions (including myself based
> on Airflow experiences).
>
> And with yetus' 12 (!)  wokflow runs over the last 2 monhts (!)
> https://pasteboard.co/JNwGLiR.png - indeed you have a high chance you have
> not experienced it, especially that you are the only person committing
> there. This is hardly representative for other projects that have 100s of
> committers and 100s of PRs a day. I am not sure if you are aware of
> that, but those are the most valuable projects for the ASF - as those are
> the ones that actually build community (Folowing "comunity over code
> motto). If you have 3 PRs in 3 months and there aare 200 other projects
> using GA, I think yetus is not going to show up in any meaningful
> statistics.
>
> I am not sure if drawing a conclusion from a project that has 3 PRs in 2
> months is the best way of drawing conclusions for the overall Apache
> organisation. I think drawing a conclusion from experiences of 5 actually
> active projects with sometimes even 100 PRs a day is probably better
> justified (yep - there are such projects).
> So I would probably agree it has little influence on projects that have no
> traffic. But enormous influence on projects that actually have traffic. You
> have several teams of people scrambling now to  somehow manage their CI as
> it is unbearable now. Is this serious ? I'd say so.
>
>         When you see Airflow backed up, maybe you should try submitting a
> PR to another project yourself to see what happens.
>
> I am already spending a TON of my private time trying to help others in the
> community. I would really appreciate a little help from your side. So maybe
> you just submit 2-3 PRs yourself any time Monday - Friday 12pm CET -> 8pm
> CET - this is where regularly bottlenecks happen. Please let everyone know
> your findings
>
> J,
>
>
> On Tue, Feb 9, 2021 at 8:35 AM Allen Wittenauer
> <a...@effectivemachines.com.invalid> wrote:
>
> >
> >
> > > On Feb 8, 2021, at 5:00 PM, Jarek Potiuk <ja...@potiuk.com> wrote:
> > >
> > >> I'm not convinced this is true. I have yet to see any of my PRs for
> > > "non-big" projects getting queued while Spark, Airflow, others are.
> Thus
> > > why I think there are only a handful of projects that are getting upset
> > > about this but the rest of us are like "meh whatever."
> > >
> > > Do you have any data on that? Or is it just anecdotal evidence?
> >
> >         Totally anecdotal.  Like when I literally ran a Yetus PR during
> > the builds meeting as you were complaining about Airflow having an X deep
> > queue. My PR ran fine, no pause.
> >
> > > You can see some analysis and actually even charts here:
> > >
> https://cwiki.apache.org/confluence/display/BUILDS/GitHub+Actions+status
> >
> >         Yes, and I don't even see Yetus showing up.  I wonder how many
> > other projects are getting dropped from the dataset....
> >
> > > Maybe you have a very tiny "PR traffic" and it is mostly in the time
> zone
> > > that is not affected?
> >
> >         True, it has very tiny PR traffic right now.  (Sep/Oct/Nov was
> > different though)  But if it was one big FIFO queue, our PR jobs would
> also
> > get queued.  They aren't even when I go look at one of the other projects
> > that does have queued jobs.
> >
> >         When you see Airflow backed up, maybe you should try submitting a
> > PR to another project yourself to see what happens.
> >
> >         All I'm saying is: right now, that document feels like it is
> > _greatly_ overstating the problem and now that you point it out, clearly
> > dropping data.  It is problem, to be sure, but not all GitHub Actions
> > projects are suffering.  (I wouldn't be surprised if smaller projects are
> > actually fast tracked through the build queue in order to avoid a tyranny
> > of the majority/resource starvation problem... which would be ironic
> given
> > how much of an issue that is at the ASF.)
>
>
>
> --
> +48 660 796 129
>

Reply via email to