i've seen a few more builds fail w/timeouts and it appears that we're
definitely NOT hitting any rate limiting.

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22005/console

[jenkins@amp-jenkins-slave-01 ~]$ curl -i -H "Authorization: token
<REDACTED>" https://api.github.com | grep Rate
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4997
X-RateLimit-Reset: 1413929848
Access-Control-Expose-Headers: ETag, Link, X-GitHub-OTP, X-RateLimit-Limit,
X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes,
X-Accepted-OAuth-Scopes, X-Poll-Interval

On Sat, Oct 18, 2014 at 12:44 AM, Davies Liu <dav...@databricks.com> wrote:

> Cool, the recent 4 build had used the new configs, thanks!
>
> Let's run more builds.
>
> Davies
>
> On Fri, Oct 17, 2014 at 11:06 PM, Josh Rosen <rosenvi...@gmail.com> wrote:
> > I think that the fix was applied.  Take a look at
> >
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21874/consoleFull
> >
> > Here, I see a fetch command that mentions this specific PR branch rather
> > than the wildcard that we had before:
> >
> >  > git fetch --tags --progress https://github.com/apache/spark.git
> > +refs/pull/2840/*:refs/remotes/origin/pr/2840/* # timeout=15
> >
> >
> > Do you have an example of a Spark PRB build that’s still failing with the
> > old fetch failure?
> >
> > - Josh
> >
> > On October 17, 2014 at 11:03:14 PM, Davies Liu (dav...@databricks.com)
> > wrote:
> >
> > How can we know the changes has been applied? I had checked several
> > recent builds, they all use the original configs.
> >
> > Davies
> >
> > On Fri, Oct 17, 2014 at 6:17 PM, Josh Rosen <rosenvi...@gmail.com>
> wrote:
> >> FYI, I edited the Spark Pull Request Builder job to try this out. Let’s
> >> see
> >> if it works (I’ll be around to revert if it doesn’t).
> >>
> >> On October 17, 2014 at 5:26:56 PM, Davies Liu (dav...@databricks.com)
> >> wrote:
> >>
> >> One finding is that all the timeout happened with this command:
> >>
> >> git fetch --tags --progress https://github.com/apache/spark.git
> >> +refs/pull/*:refs/remotes/origin/pr/*
> >>
> >> I'm thinking that maybe this may be a expensive call, we could try to
> >> use a more cheap one:
> >>
> >> git fetch --tags --progress https://github.com/apache/spark.git
> >> +refs/pull/XXX/*:refs/remotes/origin/pr/XXX/*
> >>
> >> XXX is the PullRequestID,
> >>
> >> The configuration support parameters [1], so we could put this in :
> >>
> >> +refs/pull//${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*
> >>
> >> I have not tested this yet, could you give this a try?
> >>
> >> Davies
> >>
> >>
> >> [1]
> >>
> >>
> https://wiki.jenkins-ci.org/display/JENKINS/GitHub+pull+request+builder+plugin
> >>
> >> On Fri, Oct 17, 2014 at 5:00 PM, shane knapp <skn...@berkeley.edu>
> wrote:
> >>> actually, nvm, you have to be run that command from our servers to
> affect
> >>> our limit. run it all you want from your own machines! :P
> >>>
> >>> On Fri, Oct 17, 2014 at 4:59 PM, shane knapp <skn...@berkeley.edu>
> wrote:
> >>>
> >>>> yep, and i will tell you guys ONLY if you promise to NOT try this
> >>>> yourselves... checking the rate limit also counts as a hit and
> >>>> increments
> >>>> our numbers:
> >>>>
> >>>> # curl -i https://api.github.com/users/whatever 2> /dev/null | egrep
> >>>> ^X-Rate
> >>>> X-RateLimit-Limit: 60
> >>>> X-RateLimit-Remaining: 51
> >>>> X-RateLimit-Reset: 1413590269
> >>>>
> >>>> (yes, that is the exact url that they recommended on the github site
> >>>> lol)
> >>>>
> >>>> so, earlier today, we had a spark build fail w/a git timeout at
> 10:57am,
> >>>> but there were only ~7 builds run that hour, so that points to us NOT
> >>>> hitting the rate limit... at least for this fail. whee!
> >>>>
> >>>> is it beer-thirty yet?
> >>>>
> >>>> shane
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Oct 17, 2014 at 4:52 PM, Nicholas Chammas <
> >>>> nicholas.cham...@gmail.com> wrote:
> >>>>
> >>>>> Wow, thanks for this deep dive Shane. Is there a way to check if we
> are
> >>>>> getting hit by rate limiting directly, or do we need to contact
> GitHub
> >>>>> for that?
> >>>>>
> >>>>> 2014년 10월 17일 금요일, shane knapp<skn...@berkeley.edu>님이 작성한 메시지:
> >>>>>
> >>>>> quick update:
> >>>>>>
> >>>>>> here are some stats i scraped over the past week of ALL pull request
> >>>>>> builder projects and timeout failures. due to the large number of
> >>>>>> spark
> >>>>>> ghprb jobs, i don't have great records earlier than oct 7th. the
> data
> >>>>>> is
> >>>>>> current up until ~230pm today:
> >>>>>>
> >>>>>> spark and new spark ghprb total builds vs git fetch timeouts:
> >>>>>> $ for x in 10-{09..17}; do passed=$(grep $x SORTED.passed | grep -i
> >>>>>> spark | wc -l); failed=$(grep $x SORTED | grep -i spark | wc -l);
> let
> >>>>>> total=passed+failed; fail_percent=$(echo "scale=2; $failed/$total" |
> >>>>>> bc
> >>>>>> |
> >>>>>> sed "s/^\.//g"); line="$x -- total builds: $total\tp/f:
> >>>>>> $passed/$failed\tfail%: $fail_percent%"; echo -e $line; done
> >>>>>> 10-09 -- total builds: 140 p/f: 92/48 fail%: 34%
> >>>>>> 10-10 -- total builds: 65 p/f: 59/6 fail%: 09%
> >>>>>> 10-11 -- total builds: 29 p/f: 29/0 fail%: 0%
> >>>>>> 10-12 -- total builds: 24 p/f: 21/3 fail%: 12%
> >>>>>> 10-13 -- total builds: 39 p/f: 35/4 fail%: 10%
> >>>>>> 10-14 -- total builds: 7 p/f: 5/2 fail%: 28%
> >>>>>> 10-15 -- total builds: 37 p/f: 34/3 fail%: 08%
> >>>>>> 10-16 -- total builds: 71 p/f: 59/12 fail%: 16%
> >>>>>> 10-17 -- total builds: 26 p/f: 20/6 fail%: 23%
> >>>>>>
> >>>>>> all other ghprb builds vs git fetch timeouts:
> >>>>>> $ for x in 10-{09..17}; do passed=$(grep $x SORTED.passed | grep -vi
> >>>>>> spark | wc -l); failed=$(grep $x SORTED | grep -vi spark | wc -l);
> let
> >>>>>> total=passed+failed; fail_percent=$(echo "scale=2; $failed/$total" |
> >>>>>> bc
> >>>>>> |
> >>>>>> sed "s/^\.//g"); line="$x -- total builds: $total\tp/f:
> >>>>>> $passed/$failed\tfail%: $fail_percent%"; echo -e $line; done
> >>>>>> 10-09 -- total builds: 16 p/f: 16/0 fail%: 0%
> >>>>>> 10-10 -- total builds: 46 p/f: 40/6 fail%: 13%
> >>>>>> 10-11 -- total builds: 4 p/f: 4/0 fail%: 0%
> >>>>>> 10-12 -- total builds: 2 p/f: 2/0 fail%: 0%
> >>>>>> 10-13 -- total builds: 2 p/f: 2/0 fail%: 0%
> >>>>>> 10-14 -- total builds: 10 p/f: 10/0 fail%: 0%
> >>>>>> 10-15 -- total builds: 5 p/f: 5/0 fail%: 0%
> >>>>>> 10-16 -- total builds: 5 p/f: 5/0 fail%: 0%
> >>>>>> 10-17 -- total builds: 0 p/f: 0/0 fail%: 0%
> >>>>>>
> >>>>>> note: the 15th was the day i rolled back to the earlier version of
> the
> >>>>>> git plugin. it doesn't seem to have helped much, so i'll probably
> >>>>>> bring
> >>>>>> us
> >>>>>> back up to the latest version soon.
> >>>>>> also note: rocking some floating point math on the CLI! ;)
> >>>>>>
> >>>>>> i also compared the distribution of git timeout failures vs time of
> >>>>>> day,
> >>>>>> and there appears to be no correlation. the failures are pretty
> evenly
> >>>>>> distributed over each hour of the day.
> >>>>>>
> >>>>>> we could be hitting the rate limit due to the ghprb hitting github a
> >>>>>> couple of times for each build, but we're averaging ~10-20 builds
> per
> >>>>>> hour
> >>>>>> (a build hits github 2-4 times, from what i can tell). i'll have to
> >>>>>> look
> >>>>>> more in to this on monday, but suffice to say we may need to move
> from
> >>>>>> unauthorized https fetches to authorized requests. this means
> >>>>>> retrofitting
> >>>>>> all of our jobs. yay! fun! :)
> >>>>>>
> >>>>>> another option is to have local mirrors of all of the repos. the
> >>>>>> problem w/this is that there might be a window where changes haven't
> >>>>>> made
> >>>>>> it to the local mirror and tests run against it. more fun stuff to
> >>>>>> think
> >>>>>> about...
> >>>>>>
> >>>>>> now that i have some stats, and a list of all of the times/dates of
> >>>>>> the
> >>>>>> failures, i will be drafting my email to github and firing that off
> >>>>>> later
> >>>>>> today or first thing monday.
> >>>>>>
> >>>>>> have a great weekend everyone!
> >>>>>>
> >>>>>> shane, who spent way too much time on the CLI and is ready for some
> >>>>>> beer.
> >>>>>>
> >>>>>> On Thu, Oct 16, 2014 at 1:04 PM, Nicholas Chammas <
> >>>>>> nicholas.cham...@gmail.com> wrote:
> >>>>>>
> >>>>>>> On Thu, Oct 16, 2014 at 3:55 PM, shane knapp <skn...@berkeley.edu>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> i really, truly hate non-deterministic failures.
> >>>>>>>
> >>>>>>>
> >>>>>>> Amen bruddah.
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: dev-h...@spark.apache.org
> >>
>

Reply via email to