Thanks, for that! I think this is not a "permanent" solution and the data is a bit flawed :( .
I do not think it's the fault in Pulsar/Spark per se. I think it is very hard to request from them to do any limits, even if we do it now this might again go ballistic tomorrow. And I think it's very unreasonable to request any project to decrease their load if they even do not have the tools to verify that. But let's see, maybe it will work ! There is one problem with the charts, They are flawed. They show 'workflows' not 'jobs' and one workflow might mean many jobs :(. For example the big number of workflows you can see in Airflow yesterday come from "Label when reviewed" workflows - each of which has 1 job that takes 10 seconds or so. One workflow can be 20/30 times more important than another. We cannot easily drill down to jobs, because we are using Github API to get the information, but there are limits (max num requests/hr) and we are already close to hitting it with the current setup. Going to the jobs level would mean 20x more API requests. This is the 2nd thing where INFRA <> GitHub relation I believe there was the option that GitHub provides some better and more reliable stats to analyse. J. On Fri, Jan 8, 2021 at 8:51 PM Vladimir Sitnikov < sitnikov.vladi...@gmail.com> wrote: > Jarek>workflows in/progress/queued per project and they clearly show the > Jarek> situation is getting worse by day > > The chart suggests that Pulsar, Spark and Airflow are the top contributors > to the queue. > I filed issues to Pulsar ( https://github.com/apache/pulsar/issues/9154 ) > and Spark ( https://issues.apache.org/jira/browse/SPARK-34053 ) > Hope they can do something to reduce the build time and the number of > queued jobs. > > Vladimir > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>