+1 and great thanks Chesnay for pushing this. Best, Kurt
On Thu, Jul 4, 2019 at 5:44 PM Aljoscha Krettek <aljos...@apache.org> wrote: > +1 > > Aljoscha > > > On 4. Jul 2019, at 11:09, Stephan Ewen <se...@apache.org> wrote: > > > > +1 to move to a private Travis account. > > > > I can confirm that Ververica will sponsor a Travis CI plan that is > > equivalent or a bit higher than the previous ASF quota (10 concurrent > build > > queues) > > > > Best, > > Stephan > > > > On Thu, Jul 4, 2019 at 10:46 AM Chesnay Schepler <ches...@apache.org> > wrote: > > > >> I've raised a JIRA > >> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to > inquire > >> whether it would be possible to switch to a different Travis account, > >> and if so what steps would need to be taken. > >> We need a proper confirmation from INFRA since we are not in full > >> control of the flink repository (for example, we cannot access the > >> settings page). > >> > >> If this is indeed possible, Ververica is willing sponsor a Travis > >> account for the Flink project. > >> This would provide us with more than enough resources than we need. > >> > >> Since this makes the project more reliant on resources provided by > >> external companies I would like to vote on this. > >> > >> Please vote on this proposal, as follows: > >> [ ] +1, Approve the migration to a Ververica-sponsored Travis account, > >> provided that INFRA approves > >> [ ] -1, Do not approach the migration to a Ververica-sponsored Travis > >> account > >> > >> The vote will be open for at least 24h, and until we have confirmation > >> from INFRA. The voting period may be shorter than the usual 3 days since > >> our current is effectively not working. > >> > >> On 04/07/2019 06:51, Bowen Li wrote: > >>> Re: > Are they using their own Travis CI pool, or did the switch to an > >>> entirely different CI service? > >>> > >>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are > >>> currently moving away from ASF's Travis to their own in-house metal > >>> machines at [1] with custom CI application at [2]. They've seen > >>> significant improvement w.r.t both much higher performance and > >>> basically no resource waiting time, "night-and-day" difference quoting > >>> Wes. > >>> > >>> Re: > If we can just switch to our own Travis pool, just for our > >>> project, then this might be something we can do fairly quickly? > >>> > >>> I believe so, according to [3] and [4] > >>> > >>> > >>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/> > >>> [2] https://github.com/ursa-labs/ursabot > >>> [3] > >>> > https://docs.travis-ci.com/user/migrate/open-source-repository-migration > >>> [4] > https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com > >>> > >>> > >>> > >>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <ches...@apache.org > >>> <mailto:ches...@apache.org>> wrote: > >>> > >>> Are they using their own Travis CI pool, or did the switch to an > >>> entirely different CI service? > >>> > >>> If we can just switch to our own Travis pool, just for our > >>> project, then > >>> this might be something we can do fairly quickly? > >>> > >>> On 03/07/2019 05:55, Bowen Li wrote: > >>>> I responded in the INFRA ticket [1] that I believe they are > >>> using a wrong > >>>> metric against Flink and the total build time is a completely > >>> different > >>>> thing than guaranteed build capacity. > >>>> > >>>> My response: > >>>> > >>>> "As mentioned above, since I started to pay attention to Flink's > >>> build > >>>> queue a few tens of days ago, I'm in Seattle and I saw no build > >>> was kicking > >>>> off in PST daytime in weekdays for Flink. Our teammates in China > >>> and Europe > >>>> have also reported similar observations. So we need to evaluate > >>> how the > >>>> large total build time came from - if 1) your number and 2) our > >>>> observations from three locations that cover pretty much a full > >>> day, are > >>>> all true, I **guess** one reason can be that - highly likely the > >>> extra > >>>> build time came from weekends when other Apache projects may be > >>> idle and > >>>> Flink just drains hard its congested queue. > >>>> > >>>> Please be aware of that we're not complaining about the lack of > >>> resources > >>>> in general, I'm complaining about the lack of **stable, dedicated** > >>>> resources. An example for the latter one is, currently even if > >>> no build is > >>>> in Flink's queue and I submit a request to be the queue head in PST > >>>> morning, my build won't even start in 6-8+h. That is an absurd > >>> amount of > >>>> waiting time. > >>>> > >>>> That's saying, if ASF INFRA decides to adopt a quota system and > >>> grants > >>>> Flink five DEDICATED servers that runs all the time only for > >>> Flink, that'll > >>>> be PERFECT and can totally solve our problem now. > >>>> > >>>> Please be aware of that we're not complaining about the lack of > >>> resources > >>>> in general, I'm complaining about the lack of **stable, dedicated** > >>>> resources. An example for the latter one is, currently even if > >>> no build is > >>>> in Flink's queue and I submit a request to be the queue head in PST > >>>> morning, my build won't even start in 6-8+h. That is an absurd > >>> amount of > >>>> waiting time. > >>>> > >>>> > >>>> That's saying, if ASF INFRA decides to adopt a quota system and > >>> grants > >>>> Flink five DEDICATED servers that runs all the time only for > >>> Flink, that'll > >>>> be PERFECT and can totally solve our problem now. > >>>> > >>>> I feel what's missing in the ASF INFRA's Travis resource pool is > >>> some level > >>>> of build capacity SLAs and certainty" > >>>> > >>>> > >>>> Again, I believe there are differences in nature of these two > >>> problems, > >>>> long build time v.s. lack of dedicated build resource. That's > >>> saying, > >>>> shortening build time may relieve the situation, and may not. > >>> I'm sightly > >>>> negative on disabling IT cases for PRs, due to the downside is > >>> that we are > >>>> at risk of any potential bugs in PR that UTs doesn't catch, and > >>> may cost a > >>>> lot more to fix and if it slows others down or even block > >>> others, but am > >>>> open to others opinions on it. > >>>> > >>>> AFAICT from INFRA ticket[1], donating to ASF INFRA won't be > >>> feasible to > >>>> solve our problem since INFRA's pool is fully shared and they > >>> have no > >>>> control and finer insights over resource allocation to a > >>> specific Apache > >>>> project. As mentioned in [1], Apache Arrow is moving away from > >>> ASF INFRA > >>>> Travis pool (they are actually surprised Flink hasn't plan to do > >>> so). I > >>>> know that Spark is on its own build infra. If we all agree that > >>> funding our > >>>> own build infra, I'd be glad to help investigate any potential > >>> options > >>>> after releasing 1.9 since I'm super busy with 1.9 now. > >>>> > >>>> [1] https://issues.apache.org/jira/browse/INFRA-18533 > >>>> > >>>> > >>>> > >>>> On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler > >>> <ches...@apache.org <mailto:ches...@apache.org>> wrote: > >>>> > >>>>> As a short-term stopgap, since we can assume this issue to > >>> become much > >>>>> worse in the following days/weeks, we could disable IT cases in > >>> PRs and > >>>>> only run them on master. > >>>>> > >>>>> On 02/07/2019 12:03, Chesnay Schepler wrote: > >>>>>> People really have to stop thinking that just because > >>> something works > >>>>>> for us it is also a good solution. > >>>>>> Also, please remember that our builds run for 2h from start to > >>> finish, > >>>>>> and not the 14 _minutes_ it takes for zeppelin. > >>>>>> We are dealing with an entirely different scale here, both in > >>> terms of > >>>>>> build times and number of builds. > >>>>>> > >>>>>> In this very thread people have been complaining about long queue > >>>>>> times for their builds. Surprise, other Apache projects have been > >>>>>> suffering the very same thing due to us not controlling our build > >>>>>> times. While switching services (be it Jenkins, CircleCI or > >>> whatever) > >>>>>> will possibly work for us (and these options are actually > >>> attractive, > >>>>>> like CircleCI's proper support for build artifacts), it will also > >>>>>> result in us likely negatively affecting other projects in > >>> significant > >>>>>> ways. > >>>>>> > >>>>>> Sure, the Jenkins setup has a good user experience for us, at > >>> the cost > >>>>>> of blocking Jenkins workers for a _lot_ of time. Right now we > >>> have 25 > >>>>>> PR's in our queue; that's possibly 50h we'd consume of Jenkins > >>>>>> resources, and the European contributors haven't even really > >>> started yet. > >>>>>> > >>>>>> FYI, the latest INFRA response from INFRA-18533: > >>>>>> > >>>>>> "Our rough metrics shows that Flink used over 5800 hours of > >>> build time > >>>>>> last month. That is equal to EIGHT servers running 24/7 for > >>> the ENTIRE > >>>>>> MONTH. EIGHT. nonstop. > >>>>>> When we discovered this last night, we discussed it some and > >>> are going > >>>>>> to tune down Flink to allow only five executors maximum. We > >> cannot > >>>>>> allow Flink to consume so much of a Foundation shared resource." > >>>>>> > >>>>>> So yes, we either > >>>>>> a) have to heavily reduce our CI usage or > >>>>>> b) fund our own, either maintaining it ourselves or donating > >>> to Apache. > >>>>>> > >>>>>> On 02/07/2019 05:11, Bowen Li wrote: > >>>>>>> By looking at the git history of the Jenkins script, its core > >>> part > >>>>>>> was finished in March 2017 (and only two minor update in > >>> 2017/2018), > >>>>>>> so it's been running for over two years now and feels like > >>> Zepplin > >>>>>>> community has been quite happy with it. @Jeff Zhang > >>>>>>> <mailto:zjf...@gmail.com <mailto:zjf...@gmail.com>> can you > >>> share your insights and user > >>>>>>> experience with the Jenkins+Travis approach? > >>>>>>> > >>>>>>> Things like: > >>>>>>> > >>>>>>> - has the approach completely solved the resource capacity > >>> problem > >>>>>>> for Zepplin community? is Zepplin community happy with the > >>> result? > >>>>>>> - is the whole configuration chain stable (e.g. uptime) enough? > >>>>>>> - how often do you need to maintain the Jenkins infra? how many > >>>>>>> people are usually involved in maintenance and bug-fixes? > >>>>>>> > >>>>>>> The downside of this approach seems mostly to be on the > >>> maintenance > >>>>>>> to me - maintain the script and Jenkins infra. > >>>>>>> > >>>>>>> ** Having Our Own Travis-CI.com Account ** > >>>>>>> > >>>>>>> Another alternative I've been thinking of is to have our own > >>>>>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com> > >>> account with paid dedicated > >>>>>>> resources. Note travis-ci.org <http://travis-ci.org> > >>> <http://travis-ci.org> is the free > >>>>>>> version and travis-ci.com <http://travis-ci.com> > >>> <http://travis-ci.com> is the commercial > >>>>>>> version. We currently use a shared resource pool managed by > >>> ASK INFRA > >>>>>>> team on travis-ci.org <http://travis-ci.org> > >>> <http://travis-ci.org>, but we have no control > >>>>>>> over it - we can't see how it's configured, how much > >>> resources are > >>>>>>> available, how resources are allocated among Apache projects, > >>> etc. > >>>>>>> The nice thing about having an account on travis-ci.com > >>> <http://travis-ci.com> > >>>>>>> <http://travis-ci.com> are: > >>>>>>> > >>>>>>> - relatively low cost with much better resource guarantee > >>> than what > >>>>>>> we currently have [1]: $249/month with 5 dedicated concurrency, > >>>>>>> $489/month with 10 concurrency > >>>>>>> - low maintenance work compared to using Jenkins > >>>>>>> - (potentially) no migration cost according to Travis's doc [2] > >>>>>>> (pending verification) > >>>>>>> - full control over the build capacity/configuration compared to > >>>>>>> using ASF INFRA's pool > >>>>>>> > >>>>>>> I'd be surprised if we as such a vibrant community cannot > >>> find and > >>>>>>> fund $249*12=$2988 a year in exchange for a much better > >> developer > >>>>>>> experience and much higher productivity. > >>>>>>> > >>>>>>> [1] https://travis-ci.com/plans > >>>>>>> [2] > >>>>>>> > >>>>> > >>> > >> > https://docs.travis-ci.com/user/migrate/open-source-repository-migration > >>>>>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler > >>> <ches...@apache.org <mailto:ches...@apache.org> > >>>>>>> <mailto:ches...@apache.org <mailto:ches...@apache.org>>> wrote: > >>>>>>> > >>>>>>> So yes, the Jenkins job keeps pulling the state from > >>> Travis until it > >>>>>>> finishes. > >>>>>>> > >>>>>>> Note sure I'm comfortable with the idea of using Jenkins > >>> workers > >>>>>>> just to > >>>>>>> idle for a several hours. > >>>>>>> > >>>>>>> On 29/06/2019 14:56, Jeff Zhang wrote: > >>>>>>>> Here's what zeppelin community did, we make a python > >>> script to > >>>>>>> check the > >>>>>>>> build status of pull request. > >>>>>>>> Here's script: > >>>>>>>> > >>> https://github.com/apache/zeppelin/blob/master/travis_check.py > >>>>>>>> > >>>>>>>> And this is the script we used in Jenkins build job. > >>>>>>>> > >>>>>>>> if [ -f "travis_check.py" ]; then > >>>>>>>> git log -n 1 > >>>>>>>> STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull > >>>>>>> request.*from.*" | sed > >>>>>>>> 's/.*GitHub pull request <a > >>>>>>>> href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1 > >>> \2/g') > >>>>>>>> AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g') > >>>>>>>> PR=$(echo $STATUS | awk '{print $1}' | sed > >>>>>>> 's/.*[/]\(.*\)$/\1/g') > >>>>>>>> #COMMIT=$(git log -n 1 | grep "^Merge:" | awk > >>> '{print $3}') > >>>>>>>> #if [ -z $COMMIT ]; then > >>>>>>>> # COMMIT=$(curl -s > >>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR > >>>>>>>> | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | > >>> tr '\n' ' ' > >>>>>>> | sed > >>>>>>>> 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | > >>> grep -v > >>>>>>> "apache:" | > >>>>>>>> sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g') > >>>>>>>> #fi > >>>>>>>> > >>>>>>>> # get commit hash from PR > >>>>>>>> COMMIT=$(curl -s > >>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR | > >>>>>>>> grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr > >>> '\n' ' ' > >>>>>>> | sed > >>>>>>>> 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | > >>> grep -v > >>>>>>> "apache:" | > >>>>>>>> sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g') > >>>>>>>> sleep 30 # sleep few moment to wait travis starts > >>> the build > >>>>>>>> RET_CODE=0 > >>>>>>>> python ./travis_check.py ${AUTHOR} ${COMMIT} || > >>> RET_CODE=$? > >>>>>>>> if [ $RET_CODE -eq 2 ]; then # try with repository > >>> name when > >>>>>>> travis-ci is > >>>>>>>> not available in the account > >>>>>>>> RET_CODE=0 > >>>>>>>> AUTHOR=$(curl -s > >>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR > >>>>>>>> | grep '"full_name":' | grep -v "apache/zeppelin" | sed > >>>>>>>> 's/.*[:][^"]*["]\([^/]*\).*/\1/g') > >>>>>>>> python ./travis_check.py ${AUTHOR} ${COMMIT} || > >>> RET_CODE=$? > >>>>>>>> fi > >>>>>>>> > >>>>>>>> if [ $RET_CODE -eq 2 ]; then # fail with can't find > >>> build > >>>>>>> information in > >>>>>>>> the travis > >>>>>>>> set +x > >>>>>>>> echo > >>> "-----------------------------------------------------" > >>>>>>>> echo "Looks like travis-ci is not configured for > >>> your fork." > >>>>>>>> echo "Please setup by swich on 'zeppelin' > >>> repository at > >>>>>>>> https://travis-ci.org/profile and travis-ci." > >>>>>>>> echo "And then make sure 'Build branch updates' > >>> option is > >>>>>>> enabled in > >>>>>>>> the settings > >>> https://travis-ci.org/${AUTHOR}/zeppelin/settings > >>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings> > >>>>>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>." > >>>>>>>> echo "" > >>>>>>>> echo "To trigger CI after setup, you will need > >>> ammend your > >>>>>>> last commit > >>>>>>>> with" > >>>>>>>> echo "git commit --amend" > >>>>>>>> echo "git push your-remote HEAD --force" > >>>>>>>> echo "" > >>>>>>>> echo "See > >>>>>>>> > >>>>>>> > >>>>> > >>> > >> > http://zeppelin.apache.org/contribution/contributions.html#continuous-integration > >>>>>>>> ." > >>>>>>>> fi > >>>>>>>> > >>>>>>>> exit $RET_CODE > >>>>>>>> else > >>>>>>>> set +x > >>>>>>>> echo "travis_check.py does not exists" > >>>>>>>> exit 1 > >>>>>>>> fi > >>>>>>>> > >>>>>>>> Chesnay Schepler <ches...@apache.org > >>> <mailto:ches...@apache.org> > >>>>>>> <mailto:ches...@apache.org <mailto:ches...@apache.org>>> > >>> 于2019年6月29日周六 下午3:17写道: > >>>>>>>> > >>>>>>>>> Does this imply that a Jenkins job is active as long > >>> as the > >>>>>>> Travis build > >>>>>>>>> runs? > >>>>>>>>> > >>>>>>>>> On 26/06/2019 21:28, Bowen Li wrote: > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> @Dawid, I think the "long test running" as I > >>> mentioned in the > >>>>>>> first > >>>>>>>>> email, > >>>>>>>>>> also as you guys said, belongs to "a big effort > >>> which is much > >>>>>>> harder to > >>>>>>>>>> accomplish in a short period of time and may deserve > >>> its own > >>>>>>> separate > >>>>>>>>>> discussion". Thus I didn't include it in what we can > >>> do in a > >>>>>>> foreseeable > >>>>>>>>>> short term. > >>>>>>>>>> > >>>>>>>>>> Besides, I don't think that's the ultimate reason > >>> for lack of > >>>>>>> build > >>>>>>>>>> resources. Even if the build is shortened to > >>> something like > >>>>>>> 2h, the > >>>>>>>>>> problems of no build machine works about 6 or more > >>> hours in > >>>>>>> PST daytime > >>>>>>>>>> that I described will still happen, because no > >>> machine from > >>>>>>> ASF INFRA's > >>>>>>>>>> pool is allocated to Flink. As I have paid close > >>> attention to > >>>>>>> the build > >>>>>>>>>> queue in the past few weekdays, it's a pretty clear > >>> pattern now. > >>>>>>>>>> > >>>>>>>>>> **The ultimate root cause** for that is - we don't > >>> have any > >>>>>>> **dedicated** > >>>>>>>>>> build resources that we can stably rely on. I'm > >>> actually ok to > >>>>>>> wait for a > >>>>>>>>>> long time if there are build requests running, it > >>> means at > >>>>>>> least we are > >>>>>>>>>> making progress. But I'm not ok with no build > >>> resource. A > >>>>>>> better place I > >>>>>>>>>> think we should aim at in short term is to always > >>> have at > >>>>>>> least a central > >>>>>>>>>> pool (can be 3 or 5) of machines dedicated to build > >>> Flink at > >>>>>>> any time, or > >>>>>>>>>> maybe use users resources. > >>>>>>>>>> > >>>>>>>>>> @Chesnay @Robert I synced with Jeff offline that > >>> Zeppelin > >>>>>>> community is > >>>>>>>>>> using a Jenkins job to automatically build on users' > >>> travis > >>>>>>> account and > >>>>>>>>>> link the result back to github PR. I guess the > >>> Jenkins job > >>>>>>> would fetch > >>>>>>>>>> latest upstream master and build the PR against it. > >>> Jeff has > >>>>>>> filed > >>>>>>>>> tickets > >>>>>>>>>> to learn and get access to the Jenkins infra. It'll > >>> better to > >>>>>>> fully > >>>>>>>>>> understand it first before judging this approach. > >>>>>>>>>> > >>>>>>>>>> I also heard good things about CircleCI, and ASF > >>> INFRA seems > >>>>>>> to have a > >>>>>>>>> pool > >>>>>>>>>> of build capacity there too. Can be an alternative > >>> to consider. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz < > >>>>>>>>> dwysakow...@apache.org > >>> <mailto:dwysakow...@apache.org> <mailto:dwysakow...@apache.org > >>> <mailto:dwysakow...@apache.org>>> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Sorry to jump in late, but I think Bowen missed the > >>> most > >>>>>>> important point > >>>>>>>>>>> from Chesnay's previous message in the summary. The > >>> ultimate > >>>>>>> reason for > >>>>>>>>>>> all the problems is that the tests take close to 2 > >>> hours to > >>>>>>> run already. > >>>>>>>>>>> I fully support this claim: "Unless people start > >>> caring about > >>>>>>> test times > >>>>>>>>>>> before adding them, this issue cannot be solved" > >>>>>>>>>>> > >>>>>>>>>>> This is also another reason why using user's Travis > >>> account > >>>>>>> won't help. > >>>>>>>>>>> Every few weeks we reach the user's time limit for > >>> a single > >>>>>>> profile. > >>>>>>>>>>> This makes the user's builds simply fail, until we > >>> either > >>>>>>> properly > >>>>>>>>>>> decrease the time the tests take (which I am not > >>> sure we ever > >>>>>>> did) or > >>>>>>>>>>> postpone the problem by splitting into more > >>> profiles. (Note > >>>>>>> that the ASF > >>>>>>>>>>> Travis account has higher time limits) > >>>>>>>>>>> > >>>>>>>>>>> Best, > >>>>>>>>>>> > >>>>>>>>>>> Dawid > >>>>>>>>>>> > >>>>>>>>>>> On 26/06/2019 09:36, Robert Metzger wrote: > >>>>>>>>>>>> Do we know if using "the best" available hardware > >>> would > >>>>>>> improve the > >>>>>>>>> build > >>>>>>>>>>>> times? > >>>>>>>>>>>> Imagine we would run the build on machines with > >>> plenty of > >>>>>>> main memory > >>>>>>>>> to > >>>>>>>>>>>> mount everything to ramdisk + the latest CPU > >>> architecture? > >>>>>>>>>>>> > >>>>>>>>>>>> Throwing hardware at the problem could help reduce > >>> the time > >>>>>>> of an > >>>>>>>>>>>> individual build, and using our own infrastructure > >>> would > >>>>>>> remove our > >>>>>>>>>>>> dependency on Apache's Travis account (with the > >>> obvious > >>>>>>> downside of > >>>>>>>>>>> having > >>>>>>>>>>>> to maintain the infrastructure) > >>>>>>>>>>>> We could use an open source travis alternative, to > >>> have a > >>>>>>> similar > >>>>>>>>>>>> experience and make the migration easy. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler > >>>>>>> <ches...@apache.org <mailto:ches...@apache.org> > >>> <mailto:ches...@apache.org <mailto:ches...@apache.org>>> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>>> From what I gathered, there's no special > >>> sauce that the > >>>>>>> Zeppelin > >>>>>>>>>>>>> project uses which actually integrates a users > >> Travis > >>>>>>> account into the > >>>>>>>>>>> PR. > >>>>>>>>>>>>> They just disabled Travis for PRs. And that's > >>> kind of it. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Naturally we can do this (duh) and safe the ASF a > >>> fair > >>>>>>> amount of > >>>>>>>>>>>>> resources, but there are downsides: > >>>>>>>>>>>>> > >>>>>>>>>>>>> The discoverability of the Travis check takes a > >>> nose-dive. > >>>>>>> Either we > >>>>>>>>>>>>> require every contributor to always, an every > >>> commit, also > >>>>>>> post a > >>>>>>>>> Travis > >>>>>>>>>>>>> build, or we have the reviewer sift through the > >>>>>>> contributors account > >>>>>>>>> to > >>>>>>>>>>>>> find it. > >>>>>>>>>>>>> > >>>>>>>>>>>>> This is rather cumbersome. Additionally, it's > >>> also not > >>>>>>> equivalent to > >>>>>>>>>>>>> having a PR build. > >>>>>>>>>>>>> > >>>>>>>>>>>>> A normal branch build takes a branch as is and > >>> tests it. A > >>>>>>> PR build > >>>>>>>>>>>>> merges the branch into master, and then runs it. > >>> (Fun fact: > >>>>>>> This is > >>>>>>>>> why > >>>>>>>>>>>>> a PR without merge conflicts is not being run on > >>> Travis.) > >>>>>>>>>>>>> > >>>>>>>>>>>>> And ultimately, everyone can already make use of > >> this > >>>>>>> approach anyway. > >>>>>>>>>>>>> > >>>>>>>>>>>>> On 25/06/2019 08:02, Jark Wu wrote: > >>>>>>>>>>>>>> Hi Jeff, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks for sharing the Zeppelin approach. I > >>> think it's a > >>>>>>> good idea to > >>>>>>>>>>>>>> leverage user's travis account. > >>>>>>>>>>>>>> In this way, we can have almost unlimited > >>> concurrent build > >>>>>>> jobs and > >>>>>>>>>>>>>> developers can restart build by themselves > >>> (currently only > >>>>>>> committers > >>>>>>>>>>>>>> can restart PR's build). > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> But I'm still not very clear how to integrate > >> user's > >>>>>>> travis build > >>>>>>>>> into > >>>>>>>>>>>>>> the Flink pull request's build automatically. > >>> Can you > >>>>>>> explain more in > >>>>>>>>>>>>>> detail? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Another question: does travis only build > >>> branches for user > >>>>>>> account? > >>>>>>>>>>>>>> My concern is that builds for PRs will rebase > >> user's > >>>>>>> commits against > >>>>>>>>>>>>>> current master branch. > >>>>>>>>>>>>>> This will help us to find problems before > >>> merge. Builds > >>>>>>> for branches > >>>>>>>>>>>>>> will lose the impact of new commits in master. > >>>>>>>>>>>>>> How does Zeppelin solve this problem? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks again for sharing the idea. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>> Jark > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang > >>> <zjf...@gmail.com <mailto:zjf...@gmail.com> > >>>>>>> <mailto:zjf...@gmail.com <mailto:zjf...@gmail.com>> > >>>>>>>>>>>>>> <mailto:zjf...@gmail.com > >>> <mailto:zjf...@gmail.com> <mailto:zjf...@gmail.com > >>> <mailto:zjf...@gmail.com>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi Folks, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Zeppelin meet this kind of issue before, we solve > >>>>>>> it by > >>>>>>>>> delegating > >>>>>>>>>>>>>> each > >>>>>>>>>>>>>> one's PR build to his travis account > >>> (Everyone can > >>>>>>> have 5 free > >>>>>>>>>>>>>> slot for > >>>>>>>>>>>>>> travis build). > >>>>>>>>>>>>>> Apache account travis build is only triggered when > >>>>>>> PR is merged. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Kurt Young <ykt...@gmail.com > >>> <mailto:ykt...@gmail.com> > >>>>>>> <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com>> > >>> <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com> > >>>>>>> <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com>>>> > >>>>>>>>>>>>>> 于2019年6月25日周二 上午10:16写道: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> (Forgot to cc George) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>> Kurt > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Tue, Jun 25, 2019 at 10:16 AM Kurt Young > >>>>>>> <ykt...@gmail.com <mailto:ykt...@gmail.com> > >>> <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com>> > >>>>>>>>>>>>>> <mailto:ykt...@gmail.com > >>> <mailto:ykt...@gmail.com> <mailto:ykt...@gmail.com > >>> <mailto:ykt...@gmail.com>>>> > >>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Hi Bowen, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Thanks for bringing this up. We > >>> actually have > >>>>>>> discussed > >>>>>>>>> about > >>>>>>>>>>>>>> this, and I > >>>>>>>>>>>>>>>> think Till and George have > >>>>>>>>>>>>>>>> already spend sometime investigating > >>> it. I have > >>>>>>> cced both of > >>>>>>>>>>>>>> them, and > >>>>>>>>>>>>>>>> maybe they can share > >>>>>>>>>>>>>>>> their findings. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>> Kurt > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Tue, Jun 25, 2019 at 10:08 AM Jark Wu > >>>>>>> <imj...@gmail.com <mailto:imj...@gmail.com> > >>> <mailto:imj...@gmail.com <mailto:imj...@gmail.com>> > >>>>>>>>>>>>>> <mailto:imj...@gmail.com > >>> <mailto:imj...@gmail.com> <mailto:imj...@gmail.com > >>> <mailto:imj...@gmail.com>>>> > >>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Hi Bowen, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Thanks for bringing this. We also > >>> suffered from > >>>>>>> the long > >>>>>>>>>>>>>> build time. > >>>>>>>>>>>>>>>>> I agree that we should focus on > >>> solving build > >>>>>>> capacity > >>>>>>>>>>>>>> problem in the > >>>>>>>>>>>>>>>>> thread. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> My observation is there is only one > >>> build is > >>>>>>> running, all > >>>>>>>>> the > >>>>>>>>>>>>>> others > >>>>>>>>>>>>>>>>> (other > >>>>>>>>>>>>>>>>> PRs, master) are pending. > >>>>>>>>>>>>>>>>> The pricing plan[1] of travis shows > >>> it can > >>>>>>> support > >>>>>>>>> concurrent > >>>>>>>>>>>>>> build > >>>>>>>>>>>>>>> jobs. > >>>>>>>>>>>>>>>>> But I don't know which plan we are > >>> using, might > >>>>>>> be the free > >>>>>>>>>>>>>> plan for > >>>>>>>>>>>>>>> open > >>>>>>>>>>>>>>>>> source. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I cc-ed Chesnay who may have some > >>> experience on > >>>>>>> Travis. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>>>>> Jark > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> [1]: https://travis-ci.com/plans > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Tue, 25 Jun 2019 at 08:11, Bowen Li < > >>>>>>>>> bowenl...@gmail.com <mailto:bowenl...@gmail.com> > >>> <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>> > >>>>>>>>>>>>>> <mailto:bowenl...@gmail.com > >>> <mailto:bowenl...@gmail.com> > >>>>>>> <mailto:bowenl...@gmail.com > >>> <mailto:bowenl...@gmail.com>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Hi Steven, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> I think you may not read what I > >>> wrote. The > >>>>>>> discussion is > >>>>>>>>>>> about > >>>>>>>>>>>>>>> "unstable > >>>>>>>>>>>>>>>>>> build **capacity**", in another word > >>>>>>> "unstable / lack of > >>>>>>>>>>> build > >>>>>>>>>>>>>>>>> resources", > >>>>>>>>>>>>>>>>>> not "unstable build". > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 4:40 PM > >>> Steven Wu > >>>>>>>>>>>>>> <stevenz...@gmail.com > >>> <mailto:stevenz...@gmail.com> <mailto:stevenz...@gmail.com > >>> <mailto:stevenz...@gmail.com>> > >>>>>>> <mailto:stevenz...@gmail.com > >>> <mailto:stevenz...@gmail.com> <mailto:stevenz...@gmail.com > >>> <mailto:stevenz...@gmail.com>>>> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> long and sometimes unstable build is > >>>>>>> definitely a pain > >>>>>>>>>>>>> point. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> I suspect the build failure here in > >>>>>>>>> flink-connector-kafka > >>>>>>>>>>>>>> is not > >>>>>>>>>>>>>>>>> related > >>>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>> my change. but there is no easy > >>> re-run the > >>>>>>> build on > >>>>>>>>>>>>>> travis UI. > >>>>>>>>>>>>>>> Google > >>>>>>>>>>>>>>>>>>> search showed a trick of > >>> close-and-open the > >>>>>>> PR will > >>>>>>>>>>>>>> trigger rebuild. > >>>>>>>>>>>>>>>>> but > >>>>>>>>>>>>>>>>>>> that could add noises to the PR > >>> activities. > >>>>>>>>>>>>>>>>>>> > >>>>>>> https://travis-ci.org/apache/flink/jobs/545555519 > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> travis-ci for my personal repo > >>> often failed > >>>>>>> with > >>>>>>>>>>>>>> exceeding time > >>>>>>>>>>>>>>> limit > >>>>>>>>>>>>>>>>>> after > >>>>>>>>>>>>>>>>>>> 4+ hours. > >>>>>>>>>>>>>>>>>>> The job exceeded the maximum time > >>> limit for > >>>>>>> jobs, and > >>>>>>>>> has > >>>>>>>>>>>>>> been > >>>>>>>>>>>>>>>>>> terminated. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 4:15 PM > >>> Bowen Li > >>>>>>>>>>>>>> <bowenl...@gmail.com > >>> <mailto:bowenl...@gmail.com> <mailto:bowenl...@gmail.com > >>> <mailto:bowenl...@gmail.com>> > >>>>>>> <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com> > >>> <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>>>> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>> https://travis-ci.org/apache/flink/builds/549681530 > >>>>>>>>>>>>>> This build > >>>>>>>>>>>>>>>>>> request > >>>>>>>>>>>>>>>>>>>> has > >>>>>>>>>>>>>>>>>>>> been sitting at **HEAD of the > >>> queue** > >>>>>>> since I first > >>>>>>>>> saw > >>>>>>>>>>>>>> it at PST > >>>>>>>>>>>>>>>>>> 10:30am > >>>>>>>>>>>>>>>>>>>> (not sure how long it's been > >>> there before > >>>>>>> 10:30am). > >>>>>>>>>>>>>> It's PST > >>>>>>>>>>>>>>> 4:12pm > >>>>>>>>>>>>>>>>> now > >>>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>> it hasn't started yet. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 2:48 PM > >>> Bowen Li > >>>>>>>>>>>>>> <bowenl...@gmail.com > >>> <mailto:bowenl...@gmail.com> <mailto:bowenl...@gmail.com > >>> <mailto:bowenl...@gmail.com>> > >>>>>>> <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com> > >>> <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>>>> > >>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Hi devs, > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> I've been experiencing the pain > >>>>>>> resulting from lack > >>>>>>>>>>>>>> of stable > >>>>>>>>>>>>>>>>> build > >>>>>>>>>>>>>>>>>>>>> capacity on Travis for Flink > >>> PRs [1]. > >>>>>>>>> Specifically, I > >>>>>>>>>>>>>> noticed > >>>>>>>>>>>>>>>>> often > >>>>>>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>>>> no > >>>>>>>>>>>>>>>>>>>>> build in the queue is making any > >>>>>>> progress for > >>>>>>>>> hours, > >>>>>>>>>>> and > >>>>>>>>>>>>>>> suddenly > >>>>>>>>>>>>>>>>> 5 > >>>>>>>>>>>>>>>>>> or > >>>>>>>>>>>>>>>>>>> 6 > >>>>>>>>>>>>>>>>>>>>> builds kick off all together > >>> after the > >>>>>>> long pause. > >>>>>>>>>>>>>> I'm at PST > >>>>>>>>>>>>>>>>>> (UTC-08) > >>>>>>>>>>>>>>>>>>>> time > >>>>>>>>>>>>>>>>>>>>> zone, and I've seen pause can > >>> be as > >>>>>>> long as 6 hours > >>>>>>>>>>>>>> from PST 9am > >>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>> 3pm > >>>>>>>>>>>>>>>>>>>>> (let alone the time needed to > >>> drain the > >>>>>>> queue > >>>>>>>>>>>>>> afterwards). > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> I think this has greatly > >>> impacted our > >>>>>>> productivity. > >>>>>>>>>>> I've > >>>>>>>>>>>>>>>>> experienced > >>>>>>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>>>>> PRs submitted in the early > >>> morning of > >>>>>>> PST time zone > >>>>>>>>>>>>>> won't finish > >>>>>>>>>>>>>>>>>> their > >>>>>>>>>>>>>>>>>>>>> build until late night of the > >>> same day. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> So my questions are: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> - Has anyone else experienced > >>> the same > >>>>>>> problem or > >>>>>>>>>>>>>> have similar > >>>>>>>>>>>>>>>>>>>> observation > >>>>>>>>>>>>>>>>>>>>> on TravisCI? (I suspect it > >>> has things > >>>>>>> to do with > >>>>>>>>> time > >>>>>>>>>>>>>> zone) > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> - What pricing plan of > >>> TravisCI is > >>>>>>> Flink currently > >>>>>>>>>>>>>> using? Is it > >>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>> free > >>>>>>>>>>>>>>>>>>>>> plan for open source > >>> projects? What > >>>>>>> are the > >>>>>>>>>>>>>> guaranteed build > >>>>>>>>>>>>>>>>> capacity > >>>>>>>>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>>>>>> the current plan? > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> - If the current pricing plan > >>> (either > >>>>>>> free or paid) > >>>>>>>>>>>>> can't > >>>>>>>>>>>>>>> provide > >>>>>>>>>>>>>>>>>>> stable > >>>>>>>>>>>>>>>>>>>>> build capacity, can we > >>> upgrade to a > >>>>>>> higher priced > >>>>>>>>>>>>>> plan with > >>>>>>>>>>>>>>> larger > >>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>> more > >>>>>>>>>>>>>>>>>>>>> stable build capacity? > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> BTW, another factor that > >>> contribute to > >>>>>>> the > >>>>>>>>>>>>>> productivity problem > >>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>>>>> our build is slow - we run > >>> full build > >>>>>>> for every PR > >>>>>>>>>>> and a > >>>>>>>>>>>>>>>>> successful > >>>>>>>>>>>>>>>>>>> full > >>>>>>>>>>>>>>>>>>>>> build takes ~5h. We > >>> definitely have > >>>>>>> more options to > >>>>>>>>>>>>>> solve it, > >>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>>> instance, > >>>>>>>>>>>>>>>>>>>>> modularize the build graphs > >>> and reuse > >>>>>>> artifacts > >>>>>>>>> from > >>>>>>>>>>> the > >>>>>>>>>>>>>>> previous > >>>>>>>>>>>>>>>>>>> build. > >>>>>>>>>>>>>>>>>>>>> But I think that can be a big > >>> effort > >>>>>>> which is much > >>>>>>>>>>>>>> harder to > >>>>>>>>>>>>>>>>>> accomplish > >>>>>>>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>>>>>> a short period of time and > >>> may deserve > >>>>>>> its own > >>>>>>>>>>> separate > >>>>>>>>>>>>>>>>> discussion. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> [1] > >>>>>>>>> https://travis-ci.org/apache/flink/pull_requests > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> -- > >>>>>>>>>>>>>> Best Regards > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Jeff Zhang > >>>>>>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>> > >> > >> > >