One finding is that all the timeout happened with this command: git fetch --tags --progress https://github.com/apache/spark.git +refs/pull/*:refs/remotes/origin/pr/*
I'm thinking that maybe this may be a expensive call, we could try to use a more cheap one: git fetch --tags --progress https://github.com/apache/spark.git +refs/pull/XXX/*:refs/remotes/origin/pr/XXX/* XXX is the PullRequestID, The configuration support parameters [1], so we could put this in : +refs/pull//${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/* I have not tested this yet, could you give this a try? Davies [1] https://wiki.jenkins-ci.org/display/JENKINS/GitHub+pull+request+builder+plugin On Fri, Oct 17, 2014 at 5:00 PM, shane knapp <skn...@berkeley.edu> wrote: > actually, nvm, you have to be run that command from our servers to affect > our limit. run it all you want from your own machines! :P > > On Fri, Oct 17, 2014 at 4:59 PM, shane knapp <skn...@berkeley.edu> wrote: > >> yep, and i will tell you guys ONLY if you promise to NOT try this >> yourselves... checking the rate limit also counts as a hit and increments >> our numbers: >> >> # curl -i https://api.github.com/users/whatever 2> /dev/null | egrep >> ^X-Rate >> X-RateLimit-Limit: 60 >> X-RateLimit-Remaining: 51 >> X-RateLimit-Reset: 1413590269 >> >> (yes, that is the exact url that they recommended on the github site lol) >> >> so, earlier today, we had a spark build fail w/a git timeout at 10:57am, >> but there were only ~7 builds run that hour, so that points to us NOT >> hitting the rate limit... at least for this fail. whee! >> >> is it beer-thirty yet? >> >> shane >> >> >> >> On Fri, Oct 17, 2014 at 4:52 PM, Nicholas Chammas < >> nicholas.cham...@gmail.com> wrote: >> >>> Wow, thanks for this deep dive Shane. Is there a way to check if we are >>> getting hit by rate limiting directly, or do we need to contact GitHub >>> for that? >>> >>> 2014년 10월 17일 금요일, shane knapp<skn...@berkeley.edu>님이 작성한 메시지: >>> >>> quick update: >>>> >>>> here are some stats i scraped over the past week of ALL pull request >>>> builder projects and timeout failures. due to the large number of spark >>>> ghprb jobs, i don't have great records earlier than oct 7th. the data is >>>> current up until ~230pm today: >>>> >>>> spark and new spark ghprb total builds vs git fetch timeouts: >>>> $ for x in 10-{09..17}; do passed=$(grep $x SORTED.passed | grep -i >>>> spark | wc -l); failed=$(grep $x SORTED | grep -i spark | wc -l); let >>>> total=passed+failed; fail_percent=$(echo "scale=2; $failed/$total" | bc | >>>> sed "s/^\.//g"); line="$x -- total builds: $total\tp/f: >>>> $passed/$failed\tfail%: $fail_percent%"; echo -e $line; done >>>> 10-09 -- total builds: 140 p/f: 92/48 fail%: 34% >>>> 10-10 -- total builds: 65 p/f: 59/6 fail%: 09% >>>> 10-11 -- total builds: 29 p/f: 29/0 fail%: 0% >>>> 10-12 -- total builds: 24 p/f: 21/3 fail%: 12% >>>> 10-13 -- total builds: 39 p/f: 35/4 fail%: 10% >>>> 10-14 -- total builds: 7 p/f: 5/2 fail%: 28% >>>> 10-15 -- total builds: 37 p/f: 34/3 fail%: 08% >>>> 10-16 -- total builds: 71 p/f: 59/12 fail%: 16% >>>> 10-17 -- total builds: 26 p/f: 20/6 fail%: 23% >>>> >>>> all other ghprb builds vs git fetch timeouts: >>>> $ for x in 10-{09..17}; do passed=$(grep $x SORTED.passed | grep -vi >>>> spark | wc -l); failed=$(grep $x SORTED | grep -vi spark | wc -l); let >>>> total=passed+failed; fail_percent=$(echo "scale=2; $failed/$total" | bc | >>>> sed "s/^\.//g"); line="$x -- total builds: $total\tp/f: >>>> $passed/$failed\tfail%: $fail_percent%"; echo -e $line; done >>>> 10-09 -- total builds: 16 p/f: 16/0 fail%: 0% >>>> 10-10 -- total builds: 46 p/f: 40/6 fail%: 13% >>>> 10-11 -- total builds: 4 p/f: 4/0 fail%: 0% >>>> 10-12 -- total builds: 2 p/f: 2/0 fail%: 0% >>>> 10-13 -- total builds: 2 p/f: 2/0 fail%: 0% >>>> 10-14 -- total builds: 10 p/f: 10/0 fail%: 0% >>>> 10-15 -- total builds: 5 p/f: 5/0 fail%: 0% >>>> 10-16 -- total builds: 5 p/f: 5/0 fail%: 0% >>>> 10-17 -- total builds: 0 p/f: 0/0 fail%: 0% >>>> >>>> note: the 15th was the day i rolled back to the earlier version of the >>>> git plugin. it doesn't seem to have helped much, so i'll probably bring us >>>> back up to the latest version soon. >>>> also note: rocking some floating point math on the CLI! ;) >>>> >>>> i also compared the distribution of git timeout failures vs time of day, >>>> and there appears to be no correlation. the failures are pretty evenly >>>> distributed over each hour of the day. >>>> >>>> we could be hitting the rate limit due to the ghprb hitting github a >>>> couple of times for each build, but we're averaging ~10-20 builds per hour >>>> (a build hits github 2-4 times, from what i can tell). i'll have to look >>>> more in to this on monday, but suffice to say we may need to move from >>>> unauthorized https fetches to authorized requests. this means retrofitting >>>> all of our jobs. yay! fun! :) >>>> >>>> another option is to have local mirrors of all of the repos. the >>>> problem w/this is that there might be a window where changes haven't made >>>> it to the local mirror and tests run against it. more fun stuff to think >>>> about... >>>> >>>> now that i have some stats, and a list of all of the times/dates of the >>>> failures, i will be drafting my email to github and firing that off later >>>> today or first thing monday. >>>> >>>> have a great weekend everyone! >>>> >>>> shane, who spent way too much time on the CLI and is ready for some beer. >>>> >>>> On Thu, Oct 16, 2014 at 1:04 PM, Nicholas Chammas < >>>> nicholas.cham...@gmail.com> wrote: >>>> >>>>> On Thu, Oct 16, 2014 at 3:55 PM, shane knapp <skn...@berkeley.edu> >>>>> wrote: >>>>> >>>>>> i really, truly hate non-deterministic failures. >>>>> >>>>> >>>>> Amen bruddah. >>>>> >>>> >>>> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org