+1. And thanks a lot to Chesnay for pushing this.
Best, Hequn On Thu, Jul 4, 2019 at 8:07 PM Chesnay Schepler <ches...@apache.org> wrote: > Note that the Flinkbot approach isn't that trivial either; we can't > _just_ trigger builds for a branch in the apache repo, but would first > have to clone the branch/pr into a separate repository (that is owned by > the github account that the travis account would be tied to). > > One roadblock after the next showing up... > > On 04/07/2019 11:59, Chesnay Schepler wrote: > > Small update with mostly bad news: > > > > INFRA doesn't know whether it is possible, and referred my to Travis > > support. > > They did point out that it could be problematic in regards to > > read/write permissions for the repository. > > > > From my own findings /so far/ with a test repo/organization, it does > > not appear possible to configure the Travis account used for a > > specific repository. > > > > So yeah, if we go down this route we may have to pimp the Flinkbot to > > trigger builds through the Travis REST API. > > > > On 04/07/2019 10:46, Chesnay Schepler wrote: > >> I've raised a JIRA > >> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to > >> inquire whether it would be possible to switch to a different Travis > >> account, and if so what steps would need to be taken. > >> We need a proper confirmation from INFRA since we are not in full > >> control of the flink repository (for example, we cannot access the > >> settings page). > >> > >> If this is indeed possible, Ververica is willing sponsor a Travis > >> account for the Flink project. > >> This would provide us with more than enough resources than we need. > >> > >> Since this makes the project more reliant on resources provided by > >> external companies I would like to vote on this. > >> > >> Please vote on this proposal, as follows: > >> [ ] +1, Approve the migration to a Ververica-sponsored Travis > >> account, provided that INFRA approves > >> [ ] -1, Do not approach the migration to a Ververica-sponsored Travis > >> account > >> > >> The vote will be open for at least 24h, and until we have > >> confirmation from INFRA. The voting period may be shorter than the > >> usual 3 days since our current is effectively not working. > >> > >> On 04/07/2019 06:51, Bowen Li wrote: > >>> Re: > Are they using their own Travis CI pool, or did the switch to > >>> an entirely different CI service? > >>> > >>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are > >>> currently moving away from ASF's Travis to their own in-house metal > >>> machines at [1] with custom CI application at [2]. They've seen > >>> significant improvement w.r.t both much higher performance and > >>> basically no resource waiting time, "night-and-day" difference > >>> quoting Wes. > >>> > >>> Re: > If we can just switch to our own Travis pool, just for our > >>> project, then this might be something we can do fairly quickly? > >>> > >>> I believe so, according to [3] and [4] > >>> > >>> > >>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/> > >>> [2] https://github.com/ursa-labs/ursabot > >>> [3] > >>> > https://docs.travis-ci.com/user/migrate/open-source-repository-migration > >>> > >>> [4] > >>> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com > >>> > >>> > >>> > >>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <ches...@apache.org > >>> <mailto:ches...@apache.org>> wrote: > >>> > >>> Are they using their own Travis CI pool, or did the switch to an > >>> entirely different CI service? > >>> > >>> If we can just switch to our own Travis pool, just for our > >>> project, then > >>> this might be something we can do fairly quickly? > >>> > >>> On 03/07/2019 05:55, Bowen Li wrote: > >>> > I responded in the INFRA ticket [1] that I believe they are > >>> using a wrong > >>> > metric against Flink and the total build time is a completely > >>> different > >>> > thing than guaranteed build capacity. > >>> > > >>> > My response: > >>> > > >>> > "As mentioned above, since I started to pay attention to Flink's > >>> build > >>> > queue a few tens of days ago, I'm in Seattle and I saw no build > >>> was kicking > >>> > off in PST daytime in weekdays for Flink. Our teammates in China > >>> and Europe > >>> > have also reported similar observations. So we need to evaluate > >>> how the > >>> > large total build time came from - if 1) your number and 2) our > >>> > observations from three locations that cover pretty much a full > >>> day, are > >>> > all true, I **guess** one reason can be that - highly likely the > >>> extra > >>> > build time came from weekends when other Apache projects may be > >>> idle and > >>> > Flink just drains hard its congested queue. > >>> > > >>> > Please be aware of that we're not complaining about the lack of > >>> resources > >>> > in general, I'm complaining about the lack of **stable, > >>> dedicated** > >>> > resources. An example for the latter one is, currently even if > >>> no build is > >>> > in Flink's queue and I submit a request to be the queue head > >>> in PST > >>> > morning, my build won't even start in 6-8+h. That is an absurd > >>> amount of > >>> > waiting time. > >>> > > >>> > That's saying, if ASF INFRA decides to adopt a quota system and > >>> grants > >>> > Flink five DEDICATED servers that runs all the time only for > >>> Flink, that'll > >>> > be PERFECT and can totally solve our problem now. > >>> > > >>> > Please be aware of that we're not complaining about the lack of > >>> resources > >>> > in general, I'm complaining about the lack of **stable, > >>> dedicated** > >>> > resources. An example for the latter one is, currently even if > >>> no build is > >>> > in Flink's queue and I submit a request to be the queue head > >>> in PST > >>> > morning, my build won't even start in 6-8+h. That is an absurd > >>> amount of > >>> > waiting time. > >>> > > >>> > > >>> > That's saying, if ASF INFRA decides to adopt a quota system and > >>> grants > >>> > Flink five DEDICATED servers that runs all the time only for > >>> Flink, that'll > >>> > be PERFECT and can totally solve our problem now. > >>> > > >>> > I feel what's missing in the ASF INFRA's Travis resource pool is > >>> some level > >>> > of build capacity SLAs and certainty" > >>> > > >>> > > >>> > Again, I believe there are differences in nature of these two > >>> problems, > >>> > long build time v.s. lack of dedicated build resource. That's > >>> saying, > >>> > shortening build time may relieve the situation, and may not. > >>> I'm sightly > >>> > negative on disabling IT cases for PRs, due to the downside is > >>> that we are > >>> > at risk of any potential bugs in PR that UTs doesn't catch, and > >>> may cost a > >>> > lot more to fix and if it slows others down or even block > >>> others, but am > >>> > open to others opinions on it. > >>> > > >>> > AFAICT from INFRA ticket[1], donating to ASF INFRA won't be > >>> feasible to > >>> > solve our problem since INFRA's pool is fully shared and they > >>> have no > >>> > control and finer insights over resource allocation to a > >>> specific Apache > >>> > project. As mentioned in [1], Apache Arrow is moving away from > >>> ASF INFRA > >>> > Travis pool (they are actually surprised Flink hasn't plan to do > >>> so). I > >>> > know that Spark is on its own build infra. If we all agree that > >>> funding our > >>> > own build infra, I'd be glad to help investigate any potential > >>> options > >>> > after releasing 1.9 since I'm super busy with 1.9 now. > >>> > > >>> > [1] https://issues.apache.org/jira/browse/INFRA-18533 > >>> > > >>> > > >>> > > >>> > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler > >>> <ches...@apache.org <mailto:ches...@apache.org>> wrote: > >>> > > >>> >> As a short-term stopgap, since we can assume this issue to > >>> become much > >>> >> worse in the following days/weeks, we could disable IT cases in > >>> PRs and > >>> >> only run them on master. > >>> >> > >>> >> On 02/07/2019 12:03, Chesnay Schepler wrote: > >>> >>> People really have to stop thinking that just because > >>> something works > >>> >>> for us it is also a good solution. > >>> >>> Also, please remember that our builds run for 2h from start to > >>> finish, > >>> >>> and not the 14 _minutes_ it takes for zeppelin. > >>> >>> We are dealing with an entirely different scale here, both in > >>> terms of > >>> >>> build times and number of builds. > >>> >>> > >>> >>> In this very thread people have been complaining about long > >>> queue > >>> >>> times for their builds. Surprise, other Apache projects have > >>> been > >>> >>> suffering the very same thing due to us not controlling our > >>> build > >>> >>> times. While switching services (be it Jenkins, CircleCI or > >>> whatever) > >>> >>> will possibly work for us (and these options are actually > >>> attractive, > >>> >>> like CircleCI's proper support for build artifacts), it will > >>> also > >>> >>> result in us likely negatively affecting other projects in > >>> significant > >>> >>> ways. > >>> >>> > >>> >>> Sure, the Jenkins setup has a good user experience for us, at > >>> the cost > >>> >>> of blocking Jenkins workers for a _lot_ of time. Right now we > >>> have 25 > >>> >>> PR's in our queue; that's possibly 50h we'd consume of Jenkins > >>> >>> resources, and the European contributors haven't even really > >>> started yet. > >>> >>> > >>> >>> FYI, the latest INFRA response from INFRA-18533: > >>> >>> > >>> >>> "Our rough metrics shows that Flink used over 5800 hours of > >>> build time > >>> >>> last month. That is equal to EIGHT servers running 24/7 for > >>> the ENTIRE > >>> >>> MONTH. EIGHT. nonstop. > >>> >>> When we discovered this last night, we discussed it some and > >>> are going > >>> >>> to tune down Flink to allow only five executors maximum. We > >>> cannot > >>> >>> allow Flink to consume so much of a Foundation shared > >>> resource." > >>> >>> > >>> >>> So yes, we either > >>> >>> a) have to heavily reduce our CI usage or > >>> >>> b) fund our own, either maintaining it ourselves or donating > >>> to Apache. > >>> >>> > >>> >>> On 02/07/2019 05:11, Bowen Li wrote: > >>> >>>> By looking at the git history of the Jenkins script, its core > >>> part > >>> >>>> was finished in March 2017 (and only two minor update in > >>> 2017/2018), > >>> >>>> so it's been running for over two years now and feels like > >>> Zepplin > >>> >>>> community has been quite happy with it. @Jeff Zhang > >>> >>>> <mailto:zjf...@gmail.com <mailto:zjf...@gmail.com>> can you > >>> share your insights and user > >>> >>>> experience with the Jenkins+Travis approach? > >>> >>>> > >>> >>>> Things like: > >>> >>>> > >>> >>>> - has the approach completely solved the resource capacity > >>> problem > >>> >>>> for Zepplin community? is Zepplin community happy with the > >>> result? > >>> >>>> - is the whole configuration chain stable (e.g. uptime) > >>> enough? > >>> >>>> - how often do you need to maintain the Jenkins infra? how > >>> many > >>> >>>> people are usually involved in maintenance and bug-fixes? > >>> >>>> > >>> >>>> The downside of this approach seems mostly to be on the > >>> maintenance > >>> >>>> to me - maintain the script and Jenkins infra. > >>> >>>> > >>> >>>> ** Having Our Own Travis-CI.com Account ** > >>> >>>> > >>> >>>> Another alternative I've been thinking of is to have our own > >>> >>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com> > >>> account with paid dedicated > >>> >>>> resources. Note travis-ci.org <http://travis-ci.org> > >>> <http://travis-ci.org> is the free > >>> >>>> version and travis-ci.com <http://travis-ci.com> > >>> <http://travis-ci.com> is the commercial > >>> >>>> version. We currently use a shared resource pool managed by > >>> ASK INFRA > >>> >>>> team on travis-ci.org <http://travis-ci.org> > >>> <http://travis-ci.org>, but we have no control > >>> >>>> over it - we can't see how it's configured, how much > >>> resources are > >>> >>>> available, how resources are allocated among Apache projects, > >>> etc. > >>> >>>> The nice thing about having an account on travis-ci.com > >>> <http://travis-ci.com> > >>> >>>> <http://travis-ci.com> are: > >>> >>>> > >>> >>>> - relatively low cost with much better resource guarantee > >>> than what > >>> >>>> we currently have [1]: $249/month with 5 dedicated > >>> concurrency, > >>> >>>> $489/month with 10 concurrency > >>> >>>> - low maintenance work compared to using Jenkins > >>> >>>> - (potentially) no migration cost according to Travis's doc > >>> [2] > >>> >>>> (pending verification) > >>> >>>> - full control over the build capacity/configuration > >>> compared to > >>> >>>> using ASF INFRA's pool > >>> >>>> > >>> >>>> I'd be surprised if we as such a vibrant community cannot > >>> find and > >>> >>>> fund $249*12=$2988 a year in exchange for a much better > >>> developer > >>> >>>> experience and much higher productivity. > >>> >>>> > >>> >>>> [1] https://travis-ci.com/plans > >>> >>>> [2] > >>> >>>> > >>> >> > >>> > https://docs.travis-ci.com/user/migrate/open-source-repository-migration > >>> > >>> >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler > >>> <ches...@apache.org <mailto:ches...@apache.org> > >>> >>>> <mailto:ches...@apache.org <mailto:ches...@apache.org>>> > >>> wrote: > >>> >>>> > >>> >>>> So yes, the Jenkins job keeps pulling the state from > >>> Travis until it > >>> >>>> finishes. > >>> >>>> > >>> >>>> Note sure I'm comfortable with the idea of using Jenkins > >>> workers > >>> >>>> just to > >>> >>>> idle for a several hours. > >>> >>>> > >>> >>>> On 29/06/2019 14:56, Jeff Zhang wrote: > >>> >>>> > Here's what zeppelin community did, we make a python > >>> script to > >>> >>>> check the > >>> >>>> > build status of pull request. > >>> >>>> > Here's script: > >>> >>>> > > >>> https://github.com/apache/zeppelin/blob/master/travis_check.py > >>> >>>> > > >>> >>>> > And this is the script we used in Jenkins build job. > >>> >>>> > > >>> >>>> > if [ -f "travis_check.py" ]; then > >>> >>>> > git log -n 1 > >>> >>>> > STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull > >>> >>>> request.*from.*" | sed > >>> >>>> > 's/.*GitHub pull request <a > >>> >>>> > href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1 > >>> \2/g') > >>> >>>> > AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g') > >>> >>>> > PR=$(echo $STATUS | awk '{print $1}' | sed > >>> >>>> 's/.*[/]\(.*\)$/\1/g') > >>> >>>> > #COMMIT=$(git log -n 1 | grep "^Merge:" | awk > >>> '{print $3}') > >>> >>>> > #if [ -z $COMMIT ]; then > >>> >>>> > # COMMIT=$(curl -s > >>> >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR > >>> >>>> > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | > >>> tr '\n' ' ' > >>> >>>> | sed > >>> >>>> > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | > >>> grep -v > >>> >>>> "apache:" | > >>> >>>> > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g') > >>> >>>> > #fi > >>> >>>> > > >>> >>>> > # get commit hash from PR > >>> >>>> > COMMIT=$(curl -s > >>> >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR | > >>> >>>> > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr > >>> '\n' ' ' > >>> >>>> | sed > >>> >>>> > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | > >>> grep -v > >>> >>>> "apache:" | > >>> >>>> > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g') > >>> >>>> > sleep 30 # sleep few moment to wait travis starts > >>> the build > >>> >>>> > RET_CODE=0 > >>> >>>> > python ./travis_check.py ${AUTHOR} ${COMMIT} || > >>> RET_CODE=$? > >>> >>>> > if [ $RET_CODE -eq 2 ]; then # try with repository > >>> name when > >>> >>>> travis-ci is > >>> >>>> > not available in the account > >>> >>>> > RET_CODE=0 > >>> >>>> > AUTHOR=$(curl -s > >>> >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR > >>> >>>> > | grep '"full_name":' | grep -v "apache/zeppelin" | sed > >>> >>>> > 's/.*[:][^"]*["]\([^/]*\).*/\1/g') > >>> >>>> > python ./travis_check.py ${AUTHOR} ${COMMIT} || > >>> RET_CODE=$? > >>> >>>> > fi > >>> >>>> > > >>> >>>> > if [ $RET_CODE -eq 2 ]; then # fail with can't find > >>> build > >>> >>>> information in > >>> >>>> > the travis > >>> >>>> > set +x > >>> >>>> > echo > >>> "-----------------------------------------------------" > >>> >>>> > echo "Looks like travis-ci is not configured for > >>> your fork." > >>> >>>> > echo "Please setup by swich on 'zeppelin' > >>> repository at > >>> >>>> > https://travis-ci.org/profile and travis-ci." > >>> >>>> > echo "And then make sure 'Build branch updates' > >>> option is > >>> >>>> enabled in > >>> >>>> > the settings > >>> https://travis-ci.org/${AUTHOR}/zeppelin/settings > >>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings> > >>> >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>." > >>> >>>> > echo "" > >>> >>>> > echo "To trigger CI after setup, you will need > >>> ammend your > >>> >>>> last commit > >>> >>>> > with" > >>> >>>> > echo "git commit --amend" > >>> >>>> > echo "git push your-remote HEAD --force" > >>> >>>> > echo "" > >>> >>>> > echo "See > >>> >>>> > > >>> >>>> > >>> >> > >>> > http://zeppelin.apache.org/contribution/contributions.html#continuous-integration > >>> >>>> > ." > >>> >>>> > fi > >>> >>>> > > >>> >>>> > exit $RET_CODE > >>> >>>> > else > >>> >>>> > set +x > >>> >>>> > echo "travis_check.py does not exists" > >>> >>>> > exit 1 > >>> >>>> > fi > >>> >>>> > > >>> >>>> > Chesnay Schepler <ches...@apache.org > >>> <mailto:ches...@apache.org> > >>> >>>> <mailto:ches...@apache.org <mailto:ches...@apache.org>>> > >>> 于2019年6月29日周六 下午3:17写道: > >>> >>>> > > >>> >>>> >> Does this imply that a Jenkins job is active as long > >>> as the > >>> >>>> Travis build > >>> >>>> >> runs? > >>> >>>> >> > >>> >>>> >> On 26/06/2019 21:28, Bowen Li wrote: > >>> >>>> >>> Hi, > >>> >>>> >>> > >>> >>>> >>> @Dawid, I think the "long test running" as I > >>> mentioned in the > >>> >>>> first > >>> >>>> >> email, > >>> >>>> >>> also as you guys said, belongs to "a big effort > >>> which is much > >>> >>>> harder to > >>> >>>> >>> accomplish in a short period of time and may deserve > >>> its own > >>> >>>> separate > >>> >>>> >>> discussion". Thus I didn't include it in what we can > >>> do in a > >>> >>>> foreseeable > >>> >>>> >>> short term. > >>> >>>> >>> > >>> >>>> >>> Besides, I don't think that's the ultimate reason > >>> for lack of > >>> >>>> build > >>> >>>> >>> resources. Even if the build is shortened to > >>> something like > >>> >>>> 2h, the > >>> >>>> >>> problems of no build machine works about 6 or more > >>> hours in > >>> >>>> PST daytime > >>> >>>> >>> that I described will still happen, because no > >>> machine from > >>> >>>> ASF INFRA's > >>> >>>> >>> pool is allocated to Flink. As I have paid close > >>> attention to > >>> >>>> the build > >>> >>>> >>> queue in the past few weekdays, it's a pretty clear > >>> pattern now. > >>> >>>> >>> > >>> >>>> >>> **The ultimate root cause** for that is - we don't > >>> have any > >>> >>>> **dedicated** > >>> >>>> >>> build resources that we can stably rely on. I'm > >>> actually ok to > >>> >>>> wait for a > >>> >>>> >>> long time if there are build requests running, it > >>> means at > >>> >>>> least we are > >>> >>>> >>> making progress. But I'm not ok with no build > >>> resource. A > >>> >>>> better place I > >>> >>>> >>> think we should aim at in short term is to always > >>> have at > >>> >>>> least a central > >>> >>>> >>> pool (can be 3 or 5) of machines dedicated to build > >>> Flink at > >>> >>>> any time, or > >>> >>>> >>> maybe use users resources. > >>> >>>> >>> > >>> >>>> >>> @Chesnay @Robert I synced with Jeff offline that > >>> Zeppelin > >>> >>>> community is > >>> >>>> >>> using a Jenkins job to automatically build on users' > >>> travis > >>> >>>> account and > >>> >>>> >>> link the result back to github PR. I guess the > >>> Jenkins job > >>> >>>> would fetch > >>> >>>> >>> latest upstream master and build the PR against it. > >>> Jeff has > >>> >>>> filed > >>> >>>> >> tickets > >>> >>>> >>> to learn and get access to the Jenkins infra. It'll > >>> better to > >>> >>>> fully > >>> >>>> >>> understand it first before judging this approach. > >>> >>>> >>> > >>> >>>> >>> I also heard good things about CircleCI, and ASF > >>> INFRA seems > >>> >>>> to have a > >>> >>>> >> pool > >>> >>>> >>> of build capacity there too. Can be an alternative > >>> to consider. > >>> >>>> >>> > >>> >>>> >>> > >>> >>>> >>> > >>> >>>> >>> > >>> >>>> >>> > >>> >>>> >>> > >>> >>>> >>> > >>> >>>> >>> > >>> >>>> >>> > >>> >>>> >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz < > >>> >>>> >> dwysakow...@apache.org > >>> <mailto:dwysakow...@apache.org> <mailto:dwysakow...@apache.org > >>> <mailto:dwysakow...@apache.org>>> > >>> >>>> >>> wrote: > >>> >>>> >>> > >>> >>>> >>>> Sorry to jump in late, but I think Bowen missed the > >>> most > >>> >>>> important point > >>> >>>> >>>> from Chesnay's previous message in the summary. The > >>> ultimate > >>> >>>> reason for > >>> >>>> >>>> all the problems is that the tests take close to 2 > >>> hours to > >>> >>>> run already. > >>> >>>> >>>> I fully support this claim: "Unless people start > >>> caring about > >>> >>>> test times > >>> >>>> >>>> before adding them, this issue cannot be solved" > >>> >>>> >>>> > >>> >>>> >>>> This is also another reason why using user's Travis > >>> account > >>> >>>> won't help. > >>> >>>> >>>> Every few weeks we reach the user's time limit for > >>> a single > >>> >>>> profile. > >>> >>>> >>>> This makes the user's builds simply fail, until we > >>> either > >>> >>>> properly > >>> >>>> >>>> decrease the time the tests take (which I am not > >>> sure we ever > >>> >>>> did) or > >>> >>>> >>>> postpone the problem by splitting into more > >>> profiles. (Note > >>> >>>> that the ASF > >>> >>>> >>>> Travis account has higher time limits) > >>> >>>> >>>> > >>> >>>> >>>> Best, > >>> >>>> >>>> > >>> >>>> >>>> Dawid > >>> >>>> >>>> > >>> >>>> >>>> On 26/06/2019 09:36, Robert Metzger wrote: > >>> >>>> >>>>> Do we know if using "the best" available hardware > >>> would > >>> >>>> improve the > >>> >>>> >> build > >>> >>>> >>>>> times? > >>> >>>> >>>>> Imagine we would run the build on machines with > >>> plenty of > >>> >>>> main memory > >>> >>>> >> to > >>> >>>> >>>>> mount everything to ramdisk + the latest CPU > >>> architecture? > >>> >>>> >>>>> > >>> >>>> >>>>> Throwing hardware at the problem could help reduce > >>> the time > >>> >>>> of an > >>> >>>> >>>>> individual build, and using our own infrastructure > >>> would > >>> >>>> remove our > >>> >>>> >>>>> dependency on Apache's Travis account (with the > >>> obvious > >>> >>>> downside of > >>> >>>> >>>> having > >>> >>>> >>>>> to maintain the infrastructure) > >>> >>>> >>>>> We could use an open source travis alternative, to > >>> have a > >>> >>>> similar > >>> >>>> >>>>> experience and make the migration easy. > >>> >>>> >>>>> > >>> >>>> >>>>> > >>> >>>> >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler > >>> >>>> <ches...@apache.org <mailto:ches...@apache.org> > >>> <mailto:ches...@apache.org <mailto:ches...@apache.org>>> > >>> >>>> >>>> wrote: > >>> >>>> >>>>>> >From what I gathered, there's no special > >>> sauce that the > >>> >>>> Zeppelin > >>> >>>> >>>>>> project uses which actually integrates a users > >>> Travis > >>> >>>> account into the > >>> >>>> >>>> PR. > >>> >>>> >>>>>> They just disabled Travis for PRs. And that's > >>> kind of it. > >>> >>>> >>>>>> > >>> >>>> >>>>>> Naturally we can do this (duh) and safe the ASF a > >>> fair > >>> >>>> amount of > >>> >>>> >>>>>> resources, but there are downsides: > >>> >>>> >>>>>> > >>> >>>> >>>>>> The discoverability of the Travis check takes a > >>> nose-dive. > >>> >>>> Either we > >>> >>>> >>>>>> require every contributor to always, an every > >>> commit, also > >>> >>>> post a > >>> >>>> >> Travis > >>> >>>> >>>>>> build, or we have the reviewer sift through the > >>> >>>> contributors account > >>> >>>> >> to > >>> >>>> >>>>>> find it. > >>> >>>> >>>>>> > >>> >>>> >>>>>> This is rather cumbersome. Additionally, it's > >>> also not > >>> >>>> equivalent to > >>> >>>> >>>>>> having a PR build. > >>> >>>> >>>>>> > >>> >>>> >>>>>> A normal branch build takes a branch as is and > >>> tests it. A > >>> >>>> PR build > >>> >>>> >>>>>> merges the branch into master, and then runs it. > >>> (Fun fact: > >>> >>>> This is > >>> >>>> >> why > >>> >>>> >>>>>> a PR without merge conflicts is not being run on > >>> Travis.) > >>> >>>> >>>>>> > >>> >>>> >>>>>> And ultimately, everyone can already make use > >>> of this > >>> >>>> approach anyway. > >>> >>>> >>>>>> > >>> >>>> >>>>>> On 25/06/2019 08:02, Jark Wu wrote: > >>> >>>> >>>>>>> Hi Jeff, > >>> >>>> >>>>>>> > >>> >>>> >>>>>>> Thanks for sharing the Zeppelin approach. I > >>> think it's a > >>> >>>> good idea to > >>> >>>> >>>>>>> leverage user's travis account. > >>> >>>> >>>>>>> In this way, we can have almost unlimited > >>> concurrent build > >>> >>>> jobs and > >>> >>>> >>>>>>> developers can restart build by themselves > >>> (currently only > >>> >>>> committers > >>> >>>> >>>>>>> can restart PR's build). > >>> >>>> >>>>>>> > >>> >>>> >>>>>>> But I'm still not very clear how to integrate > >>> user's > >>> >>>> travis build > >>> >>>> >> into > >>> >>>> >>>>>>> the Flink pull request's build automatically. > >>> Can you > >>> >>>> explain more in > >>> >>>> >>>>>>> detail? > >>> >>>> >>>>>>> > >>> >>>> >>>>>>> Another question: does travis only build > >>> branches for user > >>> >>>> account? > >>> >>>> >>>>>>> My concern is that builds for PRs will rebase > >>> user's > >>> >>>> commits against > >>> >>>> >>>>>>> current master branch. > >>> >>>> >>>>>>> This will help us to find problems before > >>> merge. Builds > >>> >>>> for branches > >>> >>>> >>>>>>> will lose the impact of new commits in master. > >>> >>>> >>>>>>> How does Zeppelin solve this problem? > >>> >>>> >>>>>>> > >>> >>>> >>>>>>> Thanks again for sharing the idea. > >>> >>>> >>>>>>> > >>> >>>> >>>>>>> Regards, > >>> >>>> >>>>>>> Jark > >>> >>>> >>>>>>> > >>> >>>> >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang > >>> <zjf...@gmail.com <mailto:zjf...@gmail.com> > >>> >>>> <mailto:zjf...@gmail.com <mailto:zjf...@gmail.com>> > >>> >>>> >>>>>>> <mailto:zjf...@gmail.com > >>> <mailto:zjf...@gmail.com> <mailto:zjf...@gmail.com > >>> <mailto:zjf...@gmail.com>>>> wrote: > >>> >>>> >>>>>>> > >>> >>>> >>>>>>> Hi Folks, > >>> >>>> >>>>>>> > >>> >>>> >>>>>>> Zeppelin meet this kind of issue before, we > >>> solve > >>> >>>> it by > >>> >>>> >> delegating > >>> >>>> >>>>>>> each > >>> >>>> >>>>>>> one's PR build to his travis account > >>> (Everyone can > >>> >>>> have 5 free > >>> >>>> >>>>>>> slot for > >>> >>>> >>>>>>> travis build). > >>> >>>> >>>>>>> Apache account travis build is only triggered > >>> when > >>> >>>> PR is merged. > >>> >>>> >>>>>>> > >>> >>>> >>>>>>> > >>> >>>> >>>>>>> > >>> >>>> >>>>>>> Kurt Young <ykt...@gmail.com > >>> <mailto:ykt...@gmail.com> > >>> >>>> <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com>> > >>> <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com> > >>> >>>> <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com>>>> > >>> >>>> >>>>>>> 于2019年6月25日周二 上午10:16写道: > >>> >>>> >>>>>>> > >>> >>>> >>>>>>> > (Forgot to cc George) > >>> >>>> >>>>>>> > > >>> >>>> >>>>>>> > Best, > >>> >>>> >>>>>>> > Kurt > >>> >>>> >>>>>>> > > >>> >>>> >>>>>>> > > >>> >>>> >>>>>>> > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young > >>> >>>> <ykt...@gmail.com <mailto:ykt...@gmail.com> > >>> <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com>> > >>> >>>> >>>>>>> <mailto:ykt...@gmail.com > >>> <mailto:ykt...@gmail.com> <mailto:ykt...@gmail.com > >>> <mailto:ykt...@gmail.com>>>> > >>> >>>> wrote: > >>> >>>> >>>>>>> > > >>> >>>> >>>>>>> > > Hi Bowen, > >>> >>>> >>>>>>> > > > >>> >>>> >>>>>>> > > Thanks for bringing this up. We > >>> actually have > >>> >>>> discussed > >>> >>>> >> about > >>> >>>> >>>>>>> this, and I > >>> >>>> >>>>>>> > > think Till and George have > >>> >>>> >>>>>>> > > already spend sometime investigating > >>> it. I have > >>> >>>> cced both of > >>> >>>> >>>>>>> them, and > >>> >>>> >>>>>>> > > maybe they can share > >>> >>>> >>>>>>> > > their findings. > >>> >>>> >>>>>>> > > > >>> >>>> >>>>>>> > > Best, > >>> >>>> >>>>>>> > > Kurt > >>> >>>> >>>>>>> > > > >>> >>>> >>>>>>> > > > >>> >>>> >>>>>>> > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu > >>> >>>> <imj...@gmail.com <mailto:imj...@gmail.com> > >>> <mailto:imj...@gmail.com <mailto:imj...@gmail.com>> > >>> >>>> >>>>>>> <mailto:imj...@gmail.com > >>> <mailto:imj...@gmail.com> <mailto:imj...@gmail.com > >>> <mailto:imj...@gmail.com>>>> > >>> >>>> wrote: > >>> >>>> >>>>>>> > > > >>> >>>> >>>>>>> > >> Hi Bowen, > >>> >>>> >>>>>>> > >> > >>> >>>> >>>>>>> > >> Thanks for bringing this. We also > >>> suffered from > >>> >>>> the long > >>> >>>> >>>>>>> build time. > >>> >>>> >>>>>>> > >> I agree that we should focus on > >>> solving build > >>> >>>> capacity > >>> >>>> >>>>>>> problem in the > >>> >>>> >>>>>>> > >> thread. > >>> >>>> >>>>>>> > >> > >>> >>>> >>>>>>> > >> My observation is there is only one > >>> build is > >>> >>>> running, all > >>> >>>> >> the > >>> >>>> >>>>>>> others > >>> >>>> >>>>>>> > >> (other > >>> >>>> >>>>>>> > >> PRs, master) are pending. > >>> >>>> >>>>>>> > >> The pricing plan[1] of travis shows > >>> it can > >>> >>>> support > >>> >>>> >> concurrent > >>> >>>> >>>>>>> build > >>> >>>> >>>>>>> > jobs. > >>> >>>> >>>>>>> > >> But I don't know which plan we are > >>> using, might > >>> >>>> be the free > >>> >>>> >>>>>>> plan for > >>> >>>> >>>>>>> > open > >>> >>>> >>>>>>> > >> source. > >>> >>>> >>>>>>> > >> > >>> >>>> >>>>>>> > >> I cc-ed Chesnay who may have some > >>> experience on > >>> >>>> Travis. > >>> >>>> >>>>>>> > >> > >>> >>>> >>>>>>> > >> Regards, > >>> >>>> >>>>>>> > >> Jark > >>> >>>> >>>>>>> > >> > >>> >>>> >>>>>>> > >> [1]: https://travis-ci.com/plans > >>> >>>> >>>>>>> > >> > >>> >>>> >>>>>>> > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li < > >>> >>>> >> bowenl...@gmail.com <mailto:bowenl...@gmail.com> > >>> <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>> > >>> >>>> >>>>>>> <mailto:bowenl...@gmail.com > >>> <mailto:bowenl...@gmail.com> > >>> >>>> <mailto:bowenl...@gmail.com > >>> <mailto:bowenl...@gmail.com>>>> wrote: > >>> >>>> >>>>>>> > >> > >>> >>>> >>>>>>> > >> > Hi Steven, > >>> >>>> >>>>>>> > >> > > >>> >>>> >>>>>>> > >> > I think you may not read what I > >>> wrote. The > >>> >>>> discussion is > >>> >>>> >>>> about > >>> >>>> >>>>>>> > "unstable > >>> >>>> >>>>>>> > >> > build **capacity**", in another word > >>> >>>> "unstable / lack of > >>> >>>> >>>> build > >>> >>>> >>>>>>> > >> resources", > >>> >>>> >>>>>>> > >> > not "unstable build". > >>> >>>> >>>>>>> > >> > > >>> >>>> >>>>>>> > >> > On Mon, Jun 24, 2019 at 4:40 PM > >>> Steven Wu > >>> >>>> >>>>>>> <stevenz...@gmail.com > >>> <mailto:stevenz...@gmail.com> <mailto:stevenz...@gmail.com > >>> <mailto:stevenz...@gmail.com>> > >>> >>>> <mailto:stevenz...@gmail.com > >>> <mailto:stevenz...@gmail.com> <mailto:stevenz...@gmail.com > >>> <mailto:stevenz...@gmail.com>>>> > >>> >>>> >>>>>>> > wrote: > >>> >>>> >>>>>>> > >> > > >>> >>>> >>>>>>> > >> > > long and sometimes unstable build is > >>> >>>> definitely a pain > >>> >>>> >>>>>> point. > >>> >>>> >>>>>>> > >> > > > >>> >>>> >>>>>>> > >> > > I suspect the build failure here in > >>> >>>> >> flink-connector-kafka > >>> >>>> >>>>>>> is not > >>> >>>> >>>>>>> > >> related > >>> >>>> >>>>>>> > >> > to > >>> >>>> >>>>>>> > >> > > my change. but there is no easy > >>> re-run the > >>> >>>> build on > >>> >>>> >>>>>>> travis UI. > >>> >>>> >>>>>>> > Google > >>> >>>> >>>>>>> > >> > > search showed a trick of > >>> close-and-open the > >>> >>>> PR will > >>> >>>> >>>>>>> trigger rebuild. > >>> >>>> >>>>>>> > >> but > >>> >>>> >>>>>>> > >> > > that could add noises to the PR > >>> activities. > >>> >>>> >>>>>>> > >> > > > >>> >>>> https://travis-ci.org/apache/flink/jobs/545555519 > >>> >>>> >>>>>>> > >> > > > >>> >>>> >>>>>>> > >> > > travis-ci for my personal repo > >>> often failed > >>> >>>> with > >>> >>>> >>>>>>> exceeding time > >>> >>>> >>>>>>> > limit > >>> >>>> >>>>>>> > >> > after > >>> >>>> >>>>>>> > >> > > 4+ hours. > >>> >>>> >>>>>>> > >> > > The job exceeded the maximum time > >>> limit for > >>> >>>> jobs, and > >>> >>>> >> has > >>> >>>> >>>>>>> been > >>> >>>> >>>>>>> > >> > terminated. > >>> >>>> >>>>>>> > >> > > > >>> >>>> >>>>>>> > >> > > On Mon, Jun 24, 2019 at 4:15 PM > >>> Bowen Li > >>> >>>> >>>>>>> <bowenl...@gmail.com > >>> <mailto:bowenl...@gmail.com> <mailto:bowenl...@gmail.com > >>> <mailto:bowenl...@gmail.com>> > >>> >>>> <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com> > >>> <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>>>> > >>> >>>> >>>>>>> > wrote: > >>> >>>> >>>>>>> > >> > > > >>> >>>> >>>>>>> > >> > > > > >>> >>>> https://travis-ci.org/apache/flink/builds/549681530 > >>> >>>> >>>>>>> This build > >>> >>>> >>>>>>> > >> > request > >>> >>>> >>>>>>> > >> > > > has > >>> >>>> >>>>>>> > >> > > > been sitting at **HEAD of the > >>> queue** > >>> >>>> since I first > >>> >>>> >> saw > >>> >>>> >>>>>>> it at PST > >>> >>>> >>>>>>> > >> > 10:30am > >>> >>>> >>>>>>> > >> > > > (not sure how long it's been > >>> there before > >>> >>>> 10:30am). > >>> >>>> >>>>>>> It's PST > >>> >>>> >>>>>>> > 4:12pm > >>> >>>> >>>>>>> > >> now > >>> >>>> >>>>>>> > >> > > and > >>> >>>> >>>>>>> > >> > > > it hasn't started yet. > >>> >>>> >>>>>>> > >> > > > > >>> >>>> >>>>>>> > >> > > > On Mon, Jun 24, 2019 at 2:48 PM > >>> Bowen Li > >>> >>>> >>>>>>> <bowenl...@gmail.com > >>> <mailto:bowenl...@gmail.com> <mailto:bowenl...@gmail.com > >>> <mailto:bowenl...@gmail.com>> > >>> >>>> <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com> > >>> <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>>>> > >>> >>>> >>>>>>> > >> wrote: > >>> >>>> >>>>>>> > >> > > > > >>> >>>> >>>>>>> > >> > > > > Hi devs, > >>> >>>> >>>>>>> > >> > > > > > >>> >>>> >>>>>>> > >> > > > > I've been experiencing the pain > >>> >>>> resulting from lack > >>> >>>> >>>>>>> of stable > >>> >>>> >>>>>>> > >> build > >>> >>>> >>>>>>> > >> > > > > capacity on Travis for Flink > >>> PRs [1]. > >>> >>>> >> Specifically, I > >>> >>>> >>>>>>> noticed > >>> >>>> >>>>>>> > >> often > >>> >>>> >>>>>>> > >> > > that > >>> >>>> >>>>>>> > >> > > > no > >>> >>>> >>>>>>> > >> > > > > build in the queue is making any > >>> >>>> progress for > >>> >>>> >> hours, > >>> >>>> >>>> and > >>> >>>> >>>>>>> > suddenly > >>> >>>> >>>>>>> > >> 5 > >>> >>>> >>>>>>> > >> > or > >>> >>>> >>>>>>> > >> > > 6 > >>> >>>> >>>>>>> > >> > > > > builds kick off all together > >>> after the > >>> >>>> long pause. > >>> >>>> >>>>>>> I'm at PST > >>> >>>> >>>>>>> > >> > (UTC-08) > >>> >>>> >>>>>>> > >> > > > time > >>> >>>> >>>>>>> > >> > > > > zone, and I've seen pause can > >>> be as > >>> >>>> long as 6 hours > >>> >>>> >>>>>>> from PST 9am > >>> >>>> >>>>>>> > >> to > >>> >>>> >>>>>>> > >> > 3pm > >>> >>>> >>>>>>> > >> > > > > (let alone the time needed to > >>> drain the > >>> >>>> queue > >>> >>>> >>>>>>> afterwards). > >>> >>>> >>>>>>> > >> > > > > > >>> >>>> >>>>>>> > >> > > > > I think this has greatly > >>> impacted our > >>> >>>> productivity. > >>> >>>> >>>> I've > >>> >>>> >>>>>>> > >> experienced > >>> >>>> >>>>>>> > >> > > that > >>> >>>> >>>>>>> > >> > > > > PRs submitted in the early > >>> morning of > >>> >>>> PST time zone > >>> >>>> >>>>>>> won't finish > >>> >>>> >>>>>>> > >> > their > >>> >>>> >>>>>>> > >> > > > > build until late night of the > >>> same day. > >>> >>>> >>>>>>> > >> > > > > > >>> >>>> >>>>>>> > >> > > > > So my questions are: > >>> >>>> >>>>>>> > >> > > > > > >>> >>>> >>>>>>> > >> > > > > - Has anyone else experienced > >>> the same > >>> >>>> problem or > >>> >>>> >>>>>>> have similar > >>> >>>> >>>>>>> > >> > > > observation > >>> >>>> >>>>>>> > >> > > > > on TravisCI? (I suspect it > >>> has things > >>> >>>> to do with > >>> >>>> >> time > >>> >>>> >>>>>>> zone) > >>> >>>> >>>>>>> > >> > > > > > >>> >>>> >>>>>>> > >> > > > > - What pricing plan of > >>> TravisCI is > >>> >>>> Flink currently > >>> >>>> >>>>>>> using? Is it > >>> >>>> >>>>>>> > >> the > >>> >>>> >>>>>>> > >> > > free > >>> >>>> >>>>>>> > >> > > > > plan for open source > >>> projects? What > >>> >>>> are the > >>> >>>> >>>>>>> guaranteed build > >>> >>>> >>>>>>> > >> capacity > >>> >>>> >>>>>>> > >> > > of > >>> >>>> >>>>>>> > >> > > > > the current plan? > >>> >>>> >>>>>>> > >> > > > > > >>> >>>> >>>>>>> > >> > > > > - If the current pricing plan > >>> (either > >>> >>>> free or paid) > >>> >>>> >>>>>> can't > >>> >>>> >>>>>>> > provide > >>> >>>> >>>>>>> > >> > > stable > >>> >>>> >>>>>>> > >> > > > > build capacity, can we > >>> upgrade to a > >>> >>>> higher priced > >>> >>>> >>>>>>> plan with > >>> >>>> >>>>>>> > larger > >>> >>>> >>>>>>> > >> > and > >>> >>>> >>>>>>> > >> > > > more > >>> >>>> >>>>>>> > >> > > > > stable build capacity? > >>> >>>> >>>>>>> > >> > > > > > >>> >>>> >>>>>>> > >> > > > > BTW, another factor that > >>> contribute to > >>> >>>> the > >>> >>>> >>>>>>> productivity problem > >>> >>>> >>>>>>> > is > >>> >>>> >>>>>>> > >> > that > >>> >>>> >>>>>>> > >> > > > > our build is slow - we run > >>> full build > >>> >>>> for every PR > >>> >>>> >>>> and a > >>> >>>> >>>>>>> > >> successful > >>> >>>> >>>>>>> > >> > > full > >>> >>>> >>>>>>> > >> > > > > build takes ~5h. We > >>> definitely have > >>> >>>> more options to > >>> >>>> >>>>>>> solve it, > >>> >>>> >>>>>>> > for > >>> >>>> >>>>>>> > >> > > > instance, > >>> >>>> >>>>>>> > >> > > > > modularize the build graphs > >>> and reuse > >>> >>>> artifacts > >>> >>>> >> from > >>> >>>> >>>> the > >>> >>>> >>>>>>> > previous > >>> >>>> >>>>>>> > >> > > build. > >>> >>>> >>>>>>> > >> > > > > But I think that can be a big > >>> effort > >>> >>>> which is much > >>> >>>> >>>>>>> harder to > >>> >>>> >>>>>>> > >> > accomplish > >>> >>>> >>>>>>> > >> > > > in > >>> >>>> >>>>>>> > >> > > > > a short period of time and > >>> may deserve > >>> >>>> its own > >>> >>>> >>>> separate > >>> >>>> >>>>>>> > >> discussion. > >>> >>>> >>>>>>> > >> > > > > > >>> >>>> >>>>>>> > >> > > > > [1] > >>> >>>> >> https://travis-ci.org/apache/flink/pull_requests > >>> >>>> >>>>>>> > >> > > > > > >>> >>>> >>>>>>> > >> > > > > > >>> >>>> >>>>>>> > >> > > > > >>> >>>> >>>>>>> > >> > > > >>> >>>> >>>>>>> > >> > > >>> >>>> >>>>>>> > >> > >>> >>>> >>>>>>> > > > >>> >>>> >>>>>>> > > >>> >>>> >>>>>>> > >>> >>>> >>>>>>> > >>> >>>> >>>>>>> -- > >>> >>>> >>>>>>> Best Regards > >>> >>>> >>>>>>> > >>> >>>> >>>>>>> Jeff Zhang > >>> >>>> >>>>>>> > >>> >>>> >> > >>> >>>> > >>> >>> > >>> >> > >>> > >> > >> > > > >