Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

Nicholas Chammas Fri, 17 Oct 2014 16:54:07 -0700

Wow, thanks for this deep dive Shane. Is there a way to check if we are
getting hit by rate limiting directly, or do we need to contact GitHub for
that?


2014년 10월 17일 금요일, shane knapp<[email protected]>님이 작성한 메시지:

> quick update:
>
> here are some stats i scraped over the past week of ALL pull request
> builder projects and timeout failures.  due to the large number of spark
> ghprb jobs, i don't have great records earlier than oct 7th.  the data is
> current up until ~230pm today:
>
> spark and new spark ghprb total builds vs git fetch timeouts:
> $ for x in 10-{09..17}; do passed=$(grep $x SORTED.passed | grep -i spark
> | wc -l); failed=$(grep $x SORTED | grep -i spark | wc -l); let
> total=passed+failed; fail_percent=$(echo "scale=2; $failed/$total" | bc |
> sed "s/^\.//g"); line="$x -- total builds: $total\tp/f:
>  $passed/$failed\tfail%: $fail_percent%"; echo -e $line; done
> 10-09 -- total builds: 140 p/f: 92/48 fail%: 34%
> 10-10 -- total builds: 65 p/f: 59/6 fail%: 09%
> 10-11 -- total builds: 29 p/f: 29/0 fail%: 0%
> 10-12 -- total builds: 24 p/f: 21/3 fail%: 12%
> 10-13 -- total builds: 39 p/f: 35/4 fail%: 10%
> 10-14 -- total builds: 7 p/f: 5/2 fail%: 28%
> 10-15 -- total builds: 37 p/f: 34/3 fail%: 08%
> 10-16 -- total builds: 71 p/f: 59/12 fail%: 16%
> 10-17 -- total builds: 26 p/f: 20/6 fail%: 23%
>
> all other ghprb builds vs git fetch timeouts:
> $ for x in 10-{09..17}; do passed=$(grep $x SORTED.passed | grep -vi spark
> | wc -l); failed=$(grep $x SORTED | grep -vi spark | wc -l); let
> total=passed+failed; fail_percent=$(echo "scale=2; $failed/$total" | bc |
> sed "s/^\.//g"); line="$x -- total builds: $total\tp/f:
>  $passed/$failed\tfail%: $fail_percent%"; echo -e $line; done
> 10-09 -- total builds: 16 p/f: 16/0 fail%: 0%
> 10-10 -- total builds: 46 p/f: 40/6 fail%: 13%
> 10-11 -- total builds: 4 p/f: 4/0 fail%: 0%
> 10-12 -- total builds: 2 p/f: 2/0 fail%: 0%
> 10-13 -- total builds: 2 p/f: 2/0 fail%: 0%
> 10-14 -- total builds: 10 p/f: 10/0 fail%: 0%
> 10-15 -- total builds: 5 p/f: 5/0 fail%: 0%
> 10-16 -- total builds: 5 p/f: 5/0 fail%: 0%
> 10-17 -- total builds: 0 p/f: 0/0 fail%: 0%
>
> note:  the 15th was the day i rolled back to the earlier version of the
> git plugin.  it doesn't seem to have helped much, so i'll probably bring us
> back up to the latest version soon.
> also note:  rocking some floating point math on the CLI!  ;)
>
> i also compared the distribution of git timeout failures vs time of day,
> and there appears to be no correlation.  the failures are pretty evenly
> distributed over each hour of the day.
>
> we could be hitting the rate limit due to the ghprb hitting github a
> couple of times for each build, but we're averaging ~10-20 builds per hour
> (a build hits github 2-4 times, from what i can tell).  i'll have to look
> more in to this on monday, but suffice to say we may need to move from
> unauthorized https fetches to authorized requests.  this means retrofitting
> all of our jobs.  yay!  fun!  :)
>
> another option is to have local mirrors of all of the repos.  the problem
> w/this is that there might be a window where changes haven't made it to the
> local mirror and tests run against it.  more fun stuff to think about...
>
> now that i have some stats, and a list of all of the times/dates of the
> failures, i will be drafting my email to github and firing that off later
> today or first thing monday.
>
> have a great weekend everyone!
>
> shane, who spent way too much time on the CLI and is ready for some beer.
>
> On Thu, Oct 16, 2014 at 1:04 PM, Nicholas Chammas <
> [email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>
>> On Thu, Oct 16, 2014 at 3:55 PM, shane knapp <[email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>
>>> i really, truly hate non-deterministic failures.
>>
>>
>> Amen bruddah.
>>
>
>

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

Reply via email to