Here's what zeppelin community did, we make a python script to check the build status of pull request. Here's script: https://github.com/apache/zeppelin/blob/master/travis_check.py
And this is the script we used in Jenkins build job. if [ -f "travis_check.py" ]; then git log -n 1 STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull request.*from.*" | sed 's/.*GitHub pull request <a href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1 \2/g') AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g') PR=$(echo $STATUS | awk '{print $1}' | sed 's/.*[/]\(.*\)$/\1/g') #COMMIT=$(git log -n 1 | grep "^Merge:" | awk '{print $3}') #if [ -z $COMMIT ]; then # COMMIT=$(curl -s https://api.github.com/repos/apache/zeppelin/pulls/$PR | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr '\n' ' ' | sed 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | grep -v "apache:" | sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g') #fi # get commit hash from PR COMMIT=$(curl -s https://api.github.com/repos/apache/zeppelin/pulls/$PR | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr '\n' ' ' | sed 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | grep -v "apache:" | sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g') sleep 30 # sleep few moment to wait travis starts the build RET_CODE=0 python ./travis_check.py ${AUTHOR} ${COMMIT} || RET_CODE=$? if [ $RET_CODE -eq 2 ]; then # try with repository name when travis-ci is not available in the account RET_CODE=0 AUTHOR=$(curl -s https://api.github.com/repos/apache/zeppelin/pulls/$PR | grep '"full_name":' | grep -v "apache/zeppelin" | sed 's/.*[:][^"]*["]\([^/]*\).*/\1/g') python ./travis_check.py ${AUTHOR} ${COMMIT} || RET_CODE=$? fi if [ $RET_CODE -eq 2 ]; then # fail with can't find build information in the travis set +x echo "-----------------------------------------------------" echo "Looks like travis-ci is not configured for your fork." echo "Please setup by swich on 'zeppelin' repository at https://travis-ci.org/profile and travis-ci." echo "And then make sure 'Build branch updates' option is enabled in the settings https://travis-ci.org/${AUTHOR}/zeppelin/settings." echo "" echo "To trigger CI after setup, you will need ammend your last commit with" echo "git commit --amend" echo "git push your-remote HEAD --force" echo "" echo "See http://zeppelin.apache.org/contribution/contributions.html#continuous-integration ." fi exit $RET_CODE else set +x echo "travis_check.py does not exists" exit 1 fi Chesnay Schepler <ches...@apache.org> 于2019年6月29日周六 下午3:17写道: > Does this imply that a Jenkins job is active as long as the Travis build > runs? > > On 26/06/2019 21:28, Bowen Li wrote: > > Hi, > > > > @Dawid, I think the "long test running" as I mentioned in the first > email, > > also as you guys said, belongs to "a big effort which is much harder to > > accomplish in a short period of time and may deserve its own separate > > discussion". Thus I didn't include it in what we can do in a foreseeable > > short term. > > > > Besides, I don't think that's the ultimate reason for lack of build > > resources. Even if the build is shortened to something like 2h, the > > problems of no build machine works about 6 or more hours in PST daytime > > that I described will still happen, because no machine from ASF INFRA's > > pool is allocated to Flink. As I have paid close attention to the build > > queue in the past few weekdays, it's a pretty clear pattern now. > > > > **The ultimate root cause** for that is - we don't have any **dedicated** > > build resources that we can stably rely on. I'm actually ok to wait for a > > long time if there are build requests running, it means at least we are > > making progress. But I'm not ok with no build resource. A better place I > > think we should aim at in short term is to always have at least a central > > pool (can be 3 or 5) of machines dedicated to build Flink at any time, or > > maybe use users resources. > > > > @Chesnay @Robert I synced with Jeff offline that Zeppelin community is > > using a Jenkins job to automatically build on users' travis account and > > link the result back to github PR. I guess the Jenkins job would fetch > > latest upstream master and build the PR against it. Jeff has filed > tickets > > to learn and get access to the Jenkins infra. It'll better to fully > > understand it first before judging this approach. > > > > I also heard good things about CircleCI, and ASF INFRA seems to have a > pool > > of build capacity there too. Can be an alternative to consider. > > > > > > > > > > > > > > > > > > > > On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz < > dwysakow...@apache.org> > > wrote: > > > >> Sorry to jump in late, but I think Bowen missed the most important point > >> from Chesnay's previous message in the summary. The ultimate reason for > >> all the problems is that the tests take close to 2 hours to run already. > >> I fully support this claim: "Unless people start caring about test times > >> before adding them, this issue cannot be solved" > >> > >> This is also another reason why using user's Travis account won't help. > >> Every few weeks we reach the user's time limit for a single profile. > >> This makes the user's builds simply fail, until we either properly > >> decrease the time the tests take (which I am not sure we ever did) or > >> postpone the problem by splitting into more profiles. (Note that the ASF > >> Travis account has higher time limits) > >> > >> Best, > >> > >> Dawid > >> > >> On 26/06/2019 09:36, Robert Metzger wrote: > >>> Do we know if using "the best" available hardware would improve the > build > >>> times? > >>> Imagine we would run the build on machines with plenty of main memory > to > >>> mount everything to ramdisk + the latest CPU architecture? > >>> > >>> Throwing hardware at the problem could help reduce the time of an > >>> individual build, and using our own infrastructure would remove our > >>> dependency on Apache's Travis account (with the obvious downside of > >> having > >>> to maintain the infrastructure) > >>> We could use an open source travis alternative, to have a similar > >>> experience and make the migration easy. > >>> > >>> > >>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler <ches...@apache.org> > >> wrote: > >>>> From what I gathered, there's no special sauce that the Zeppelin > >>>> project uses which actually integrates a users Travis account into the > >> PR. > >>>> They just disabled Travis for PRs. And that's kind of it. > >>>> > >>>> Naturally we can do this (duh) and safe the ASF a fair amount of > >>>> resources, but there are downsides: > >>>> > >>>> The discoverability of the Travis check takes a nose-dive. Either we > >>>> require every contributor to always, an every commit, also post a > Travis > >>>> build, or we have the reviewer sift through the contributors account > to > >>>> find it. > >>>> > >>>> This is rather cumbersome. Additionally, it's also not equivalent to > >>>> having a PR build. > >>>> > >>>> A normal branch build takes a branch as is and tests it. A PR build > >>>> merges the branch into master, and then runs it. (Fun fact: This is > why > >>>> a PR without merge conflicts is not being run on Travis.) > >>>> > >>>> And ultimately, everyone can already make use of this approach anyway. > >>>> > >>>> On 25/06/2019 08:02, Jark Wu wrote: > >>>>> Hi Jeff, > >>>>> > >>>>> Thanks for sharing the Zeppelin approach. I think it's a good idea to > >>>>> leverage user's travis account. > >>>>> In this way, we can have almost unlimited concurrent build jobs and > >>>>> developers can restart build by themselves (currently only committers > >>>>> can restart PR's build). > >>>>> > >>>>> But I'm still not very clear how to integrate user's travis build > into > >>>>> the Flink pull request's build automatically. Can you explain more in > >>>>> detail? > >>>>> > >>>>> Another question: does travis only build branches for user account? > >>>>> My concern is that builds for PRs will rebase user's commits against > >>>>> current master branch. > >>>>> This will help us to find problems before merge. Builds for branches > >>>>> will lose the impact of new commits in master. > >>>>> How does Zeppelin solve this problem? > >>>>> > >>>>> Thanks again for sharing the idea. > >>>>> > >>>>> Regards, > >>>>> Jark > >>>>> > >>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang <zjf...@gmail.com > >>>>> <mailto:zjf...@gmail.com>> wrote: > >>>>> > >>>>> Hi Folks, > >>>>> > >>>>> Zeppelin meet this kind of issue before, we solve it by > delegating > >>>>> each > >>>>> one's PR build to his travis account (Everyone can have 5 free > >>>>> slot for > >>>>> travis build). > >>>>> Apache account travis build is only triggered when PR is merged. > >>>>> > >>>>> > >>>>> > >>>>> Kurt Young <ykt...@gmail.com <mailto:ykt...@gmail.com>> > >>>>> 于2019年6月25日周二 上午10:16写道: > >>>>> > >>>>> > (Forgot to cc George) > >>>>> > > >>>>> > Best, > >>>>> > Kurt > >>>>> > > >>>>> > > >>>>> > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young <ykt...@gmail.com > >>>>> <mailto:ykt...@gmail.com>> wrote: > >>>>> > > >>>>> > > Hi Bowen, > >>>>> > > > >>>>> > > Thanks for bringing this up. We actually have discussed > about > >>>>> this, and I > >>>>> > > think Till and George have > >>>>> > > already spend sometime investigating it. I have cced both of > >>>>> them, and > >>>>> > > maybe they can share > >>>>> > > their findings. > >>>>> > > > >>>>> > > Best, > >>>>> > > Kurt > >>>>> > > > >>>>> > > > >>>>> > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu <imj...@gmail.com > >>>>> <mailto:imj...@gmail.com>> wrote: > >>>>> > > > >>>>> > >> Hi Bowen, > >>>>> > >> > >>>>> > >> Thanks for bringing this. We also suffered from the long > >>>>> build time. > >>>>> > >> I agree that we should focus on solving build capacity > >>>>> problem in the > >>>>> > >> thread. > >>>>> > >> > >>>>> > >> My observation is there is only one build is running, all > the > >>>>> others > >>>>> > >> (other > >>>>> > >> PRs, master) are pending. > >>>>> > >> The pricing plan[1] of travis shows it can support > concurrent > >>>>> build > >>>>> > jobs. > >>>>> > >> But I don't know which plan we are using, might be the free > >>>>> plan for > >>>>> > open > >>>>> > >> source. > >>>>> > >> > >>>>> > >> I cc-ed Chesnay who may have some experience on Travis. > >>>>> > >> > >>>>> > >> Regards, > >>>>> > >> Jark > >>>>> > >> > >>>>> > >> [1]: https://travis-ci.com/plans > >>>>> > >> > >>>>> > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li < > bowenl...@gmail.com > >>>>> <mailto:bowenl...@gmail.com>> wrote: > >>>>> > >> > >>>>> > >> > Hi Steven, > >>>>> > >> > > >>>>> > >> > I think you may not read what I wrote. The discussion is > >> about > >>>>> > "unstable > >>>>> > >> > build **capacity**", in another word "unstable / lack of > >> build > >>>>> > >> resources", > >>>>> > >> > not "unstable build". > >>>>> > >> > > >>>>> > >> > On Mon, Jun 24, 2019 at 4:40 PM Steven Wu > >>>>> <stevenz...@gmail.com <mailto:stevenz...@gmail.com>> > >>>>> > wrote: > >>>>> > >> > > >>>>> > >> > > long and sometimes unstable build is definitely a pain > >>>> point. > >>>>> > >> > > > >>>>> > >> > > I suspect the build failure here in > flink-connector-kafka > >>>>> is not > >>>>> > >> related > >>>>> > >> > to > >>>>> > >> > > my change. but there is no easy re-run the build on > >>>>> travis UI. > >>>>> > Google > >>>>> > >> > > search showed a trick of close-and-open the PR will > >>>>> trigger rebuild. > >>>>> > >> but > >>>>> > >> > > that could add noises to the PR activities. > >>>>> > >> > > https://travis-ci.org/apache/flink/jobs/545555519 > >>>>> > >> > > > >>>>> > >> > > travis-ci for my personal repo often failed with > >>>>> exceeding time > >>>>> > limit > >>>>> > >> > after > >>>>> > >> > > 4+ hours. > >>>>> > >> > > The job exceeded the maximum time limit for jobs, and > has > >>>>> been > >>>>> > >> > terminated. > >>>>> > >> > > > >>>>> > >> > > On Mon, Jun 24, 2019 at 4:15 PM Bowen Li > >>>>> <bowenl...@gmail.com <mailto:bowenl...@gmail.com>> > >>>>> > wrote: > >>>>> > >> > > > >>>>> > >> > > > https://travis-ci.org/apache/flink/builds/549681530 > >>>>> This build > >>>>> > >> > request > >>>>> > >> > > > has > >>>>> > >> > > > been sitting at **HEAD of the queue** since I first > saw > >>>>> it at PST > >>>>> > >> > 10:30am > >>>>> > >> > > > (not sure how long it's been there before 10:30am). > >>>>> It's PST > >>>>> > 4:12pm > >>>>> > >> now > >>>>> > >> > > and > >>>>> > >> > > > it hasn't started yet. > >>>>> > >> > > > > >>>>> > >> > > > On Mon, Jun 24, 2019 at 2:48 PM Bowen Li > >>>>> <bowenl...@gmail.com <mailto:bowenl...@gmail.com>> > >>>>> > >> wrote: > >>>>> > >> > > > > >>>>> > >> > > > > Hi devs, > >>>>> > >> > > > > > >>>>> > >> > > > > I've been experiencing the pain resulting from lack > >>>>> of stable > >>>>> > >> build > >>>>> > >> > > > > capacity on Travis for Flink PRs [1]. > Specifically, I > >>>>> noticed > >>>>> > >> often > >>>>> > >> > > that > >>>>> > >> > > > no > >>>>> > >> > > > > build in the queue is making any progress for > hours, > >> and > >>>>> > suddenly > >>>>> > >> 5 > >>>>> > >> > or > >>>>> > >> > > 6 > >>>>> > >> > > > > builds kick off all together after the long pause. > >>>>> I'm at PST > >>>>> > >> > (UTC-08) > >>>>> > >> > > > time > >>>>> > >> > > > > zone, and I've seen pause can be as long as 6 hours > >>>>> from PST 9am > >>>>> > >> to > >>>>> > >> > 3pm > >>>>> > >> > > > > (let alone the time needed to drain the queue > >>>>> afterwards). > >>>>> > >> > > > > > >>>>> > >> > > > > I think this has greatly impacted our productivity. > >> I've > >>>>> > >> experienced > >>>>> > >> > > that > >>>>> > >> > > > > PRs submitted in the early morning of PST time zone > >>>>> won't finish > >>>>> > >> > their > >>>>> > >> > > > > build until late night of the same day. > >>>>> > >> > > > > > >>>>> > >> > > > > So my questions are: > >>>>> > >> > > > > > >>>>> > >> > > > > - Has anyone else experienced the same problem or > >>>>> have similar > >>>>> > >> > > > observation > >>>>> > >> > > > > on TravisCI? (I suspect it has things to do with > time > >>>>> zone) > >>>>> > >> > > > > > >>>>> > >> > > > > - What pricing plan of TravisCI is Flink currently > >>>>> using? Is it > >>>>> > >> the > >>>>> > >> > > free > >>>>> > >> > > > > plan for open source projects? What are the > >>>>> guaranteed build > >>>>> > >> capacity > >>>>> > >> > > of > >>>>> > >> > > > > the current plan? > >>>>> > >> > > > > > >>>>> > >> > > > > - If the current pricing plan (either free or paid) > >>>> can't > >>>>> > provide > >>>>> > >> > > stable > >>>>> > >> > > > > build capacity, can we upgrade to a higher priced > >>>>> plan with > >>>>> > larger > >>>>> > >> > and > >>>>> > >> > > > more > >>>>> > >> > > > > stable build capacity? > >>>>> > >> > > > > > >>>>> > >> > > > > BTW, another factor that contribute to the > >>>>> productivity problem > >>>>> > is > >>>>> > >> > that > >>>>> > >> > > > > our build is slow - we run full build for every PR > >> and a > >>>>> > >> successful > >>>>> > >> > > full > >>>>> > >> > > > > build takes ~5h. We definitely have more options to > >>>>> solve it, > >>>>> > for > >>>>> > >> > > > instance, > >>>>> > >> > > > > modularize the build graphs and reuse artifacts > from > >> the > >>>>> > previous > >>>>> > >> > > build. > >>>>> > >> > > > > But I think that can be a big effort which is much > >>>>> harder to > >>>>> > >> > accomplish > >>>>> > >> > > > in > >>>>> > >> > > > > a short period of time and may deserve its own > >> separate > >>>>> > >> discussion. > >>>>> > >> > > > > > >>>>> > >> > > > > [1] > https://travis-ci.org/apache/flink/pull_requests > >>>>> > >> > > > > > >>>>> > >> > > > > > >>>>> > >> > > > > >>>>> > >> > > > >>>>> > >> > > >>>>> > >> > >>>>> > > > >>>>> > > >>>>> > >>>>> > >>>>> -- > >>>>> Best Regards > >>>>> > >>>>> Jeff Zhang > >>>>> > >> > > -- Best Regards Jeff Zhang