Thanks, for that!

I think this is not a "permanent" solution and the data is a bit flawed :( .

I do not think it's the fault in Pulsar/Spark per se. I think it is very
hard to request from them to do any limits, even if we do it now this might
again go ballistic tomorrow. And I think it's very unreasonable to request
any project to decrease their load if they even do not have the tools to
verify that.

But let's see, maybe it will work !

There is one problem with the charts, They are flawed. They show
'workflows' not 'jobs' and one workflow might mean many jobs :(. For
example the big number of workflows you can see  in Airflow yesterday come
from "Label when reviewed" workflows - each of which has 1 job that
takes 10 seconds or so. One workflow  can be 20/30 times more important
than another.

We cannot easily drill down to jobs, because we are using Github API to get
the information, but there are limits (max num requests/hr) and we are
already close to hitting it with the current setup.

Going to the jobs level would mean 20x more API requests. This is the 2nd
thing where INFRA <> GitHub relation I believe there was the option that
GitHub provides some better and more reliable stats to analyse.

J.

On Fri, Jan 8, 2021 at 8:51 PM Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:

> Jarek>workflows in/progress/queued per project and they clearly show the
> Jarek> situation is getting worse by day
>
> The chart suggests that Pulsar, Spark and Airflow are the top contributors
> to the queue.
> I filed issues to Pulsar ( https://github.com/apache/pulsar/issues/9154 )
> and Spark ( https://issues.apache.org/jira/browse/SPARK-34053 )
> Hope they can do something to reduce the build time and the number of
> queued jobs.
>
> Vladimir
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to