+1 Aljoscha
> On 4. Jul 2019, at 11:09, Stephan Ewen <se...@apache.org> wrote: > > +1 to move to a private Travis account. > > I can confirm that Ververica will sponsor a Travis CI plan that is > equivalent or a bit higher than the previous ASF quota (10 concurrent build > queues) > > Best, > Stephan > > On Thu, Jul 4, 2019 at 10:46 AM Chesnay Schepler <ches...@apache.org> wrote: > >> I've raised a JIRA >> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to inquire >> whether it would be possible to switch to a different Travis account, >> and if so what steps would need to be taken. >> We need a proper confirmation from INFRA since we are not in full >> control of the flink repository (for example, we cannot access the >> settings page). >> >> If this is indeed possible, Ververica is willing sponsor a Travis >> account for the Flink project. >> This would provide us with more than enough resources than we need. >> >> Since this makes the project more reliant on resources provided by >> external companies I would like to vote on this. >> >> Please vote on this proposal, as follows: >> [ ] +1, Approve the migration to a Ververica-sponsored Travis account, >> provided that INFRA approves >> [ ] -1, Do not approach the migration to a Ververica-sponsored Travis >> account >> >> The vote will be open for at least 24h, and until we have confirmation >> from INFRA. The voting period may be shorter than the usual 3 days since >> our current is effectively not working. >> >> On 04/07/2019 06:51, Bowen Li wrote: >>> Re: > Are they using their own Travis CI pool, or did the switch to an >>> entirely different CI service? >>> >>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are >>> currently moving away from ASF's Travis to their own in-house metal >>> machines at [1] with custom CI application at [2]. They've seen >>> significant improvement w.r.t both much higher performance and >>> basically no resource waiting time, "night-and-day" difference quoting >>> Wes. >>> >>> Re: > If we can just switch to our own Travis pool, just for our >>> project, then this might be something we can do fairly quickly? >>> >>> I believe so, according to [3] and [4] >>> >>> >>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/> >>> [2] https://github.com/ursa-labs/ursabot >>> [3] >>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration >>> [4] https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com >>> >>> >>> >>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <ches...@apache.org >>> <mailto:ches...@apache.org>> wrote: >>> >>> Are they using their own Travis CI pool, or did the switch to an >>> entirely different CI service? >>> >>> If we can just switch to our own Travis pool, just for our >>> project, then >>> this might be something we can do fairly quickly? >>> >>> On 03/07/2019 05:55, Bowen Li wrote: >>>> I responded in the INFRA ticket [1] that I believe they are >>> using a wrong >>>> metric against Flink and the total build time is a completely >>> different >>>> thing than guaranteed build capacity. >>>> >>>> My response: >>>> >>>> "As mentioned above, since I started to pay attention to Flink's >>> build >>>> queue a few tens of days ago, I'm in Seattle and I saw no build >>> was kicking >>>> off in PST daytime in weekdays for Flink. Our teammates in China >>> and Europe >>>> have also reported similar observations. So we need to evaluate >>> how the >>>> large total build time came from - if 1) your number and 2) our >>>> observations from three locations that cover pretty much a full >>> day, are >>>> all true, I **guess** one reason can be that - highly likely the >>> extra >>>> build time came from weekends when other Apache projects may be >>> idle and >>>> Flink just drains hard its congested queue. >>>> >>>> Please be aware of that we're not complaining about the lack of >>> resources >>>> in general, I'm complaining about the lack of **stable, dedicated** >>>> resources. An example for the latter one is, currently even if >>> no build is >>>> in Flink's queue and I submit a request to be the queue head in PST >>>> morning, my build won't even start in 6-8+h. That is an absurd >>> amount of >>>> waiting time. >>>> >>>> That's saying, if ASF INFRA decides to adopt a quota system and >>> grants >>>> Flink five DEDICATED servers that runs all the time only for >>> Flink, that'll >>>> be PERFECT and can totally solve our problem now. >>>> >>>> Please be aware of that we're not complaining about the lack of >>> resources >>>> in general, I'm complaining about the lack of **stable, dedicated** >>>> resources. An example for the latter one is, currently even if >>> no build is >>>> in Flink's queue and I submit a request to be the queue head in PST >>>> morning, my build won't even start in 6-8+h. That is an absurd >>> amount of >>>> waiting time. >>>> >>>> >>>> That's saying, if ASF INFRA decides to adopt a quota system and >>> grants >>>> Flink five DEDICATED servers that runs all the time only for >>> Flink, that'll >>>> be PERFECT and can totally solve our problem now. >>>> >>>> I feel what's missing in the ASF INFRA's Travis resource pool is >>> some level >>>> of build capacity SLAs and certainty" >>>> >>>> >>>> Again, I believe there are differences in nature of these two >>> problems, >>>> long build time v.s. lack of dedicated build resource. That's >>> saying, >>>> shortening build time may relieve the situation, and may not. >>> I'm sightly >>>> negative on disabling IT cases for PRs, due to the downside is >>> that we are >>>> at risk of any potential bugs in PR that UTs doesn't catch, and >>> may cost a >>>> lot more to fix and if it slows others down or even block >>> others, but am >>>> open to others opinions on it. >>>> >>>> AFAICT from INFRA ticket[1], donating to ASF INFRA won't be >>> feasible to >>>> solve our problem since INFRA's pool is fully shared and they >>> have no >>>> control and finer insights over resource allocation to a >>> specific Apache >>>> project. As mentioned in [1], Apache Arrow is moving away from >>> ASF INFRA >>>> Travis pool (they are actually surprised Flink hasn't plan to do >>> so). I >>>> know that Spark is on its own build infra. If we all agree that >>> funding our >>>> own build infra, I'd be glad to help investigate any potential >>> options >>>> after releasing 1.9 since I'm super busy with 1.9 now. >>>> >>>> [1] https://issues.apache.org/jira/browse/INFRA-18533 >>>> >>>> >>>> >>>> On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler >>> <ches...@apache.org <mailto:ches...@apache.org>> wrote: >>>> >>>>> As a short-term stopgap, since we can assume this issue to >>> become much >>>>> worse in the following days/weeks, we could disable IT cases in >>> PRs and >>>>> only run them on master. >>>>> >>>>> On 02/07/2019 12:03, Chesnay Schepler wrote: >>>>>> People really have to stop thinking that just because >>> something works >>>>>> for us it is also a good solution. >>>>>> Also, please remember that our builds run for 2h from start to >>> finish, >>>>>> and not the 14 _minutes_ it takes for zeppelin. >>>>>> We are dealing with an entirely different scale here, both in >>> terms of >>>>>> build times and number of builds. >>>>>> >>>>>> In this very thread people have been complaining about long queue >>>>>> times for their builds. Surprise, other Apache projects have been >>>>>> suffering the very same thing due to us not controlling our build >>>>>> times. While switching services (be it Jenkins, CircleCI or >>> whatever) >>>>>> will possibly work for us (and these options are actually >>> attractive, >>>>>> like CircleCI's proper support for build artifacts), it will also >>>>>> result in us likely negatively affecting other projects in >>> significant >>>>>> ways. >>>>>> >>>>>> Sure, the Jenkins setup has a good user experience for us, at >>> the cost >>>>>> of blocking Jenkins workers for a _lot_ of time. Right now we >>> have 25 >>>>>> PR's in our queue; that's possibly 50h we'd consume of Jenkins >>>>>> resources, and the European contributors haven't even really >>> started yet. >>>>>> >>>>>> FYI, the latest INFRA response from INFRA-18533: >>>>>> >>>>>> "Our rough metrics shows that Flink used over 5800 hours of >>> build time >>>>>> last month. That is equal to EIGHT servers running 24/7 for >>> the ENTIRE >>>>>> MONTH. EIGHT. nonstop. >>>>>> When we discovered this last night, we discussed it some and >>> are going >>>>>> to tune down Flink to allow only five executors maximum. We >> cannot >>>>>> allow Flink to consume so much of a Foundation shared resource." >>>>>> >>>>>> So yes, we either >>>>>> a) have to heavily reduce our CI usage or >>>>>> b) fund our own, either maintaining it ourselves or donating >>> to Apache. >>>>>> >>>>>> On 02/07/2019 05:11, Bowen Li wrote: >>>>>>> By looking at the git history of the Jenkins script, its core >>> part >>>>>>> was finished in March 2017 (and only two minor update in >>> 2017/2018), >>>>>>> so it's been running for over two years now and feels like >>> Zepplin >>>>>>> community has been quite happy with it. @Jeff Zhang >>>>>>> <mailto:zjf...@gmail.com <mailto:zjf...@gmail.com>> can you >>> share your insights and user >>>>>>> experience with the Jenkins+Travis approach? >>>>>>> >>>>>>> Things like: >>>>>>> >>>>>>> - has the approach completely solved the resource capacity >>> problem >>>>>>> for Zepplin community? is Zepplin community happy with the >>> result? >>>>>>> - is the whole configuration chain stable (e.g. uptime) enough? >>>>>>> - how often do you need to maintain the Jenkins infra? how many >>>>>>> people are usually involved in maintenance and bug-fixes? >>>>>>> >>>>>>> The downside of this approach seems mostly to be on the >>> maintenance >>>>>>> to me - maintain the script and Jenkins infra. >>>>>>> >>>>>>> ** Having Our Own Travis-CI.com Account ** >>>>>>> >>>>>>> Another alternative I've been thinking of is to have our own >>>>>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com> >>> account with paid dedicated >>>>>>> resources. Note travis-ci.org <http://travis-ci.org> >>> <http://travis-ci.org> is the free >>>>>>> version and travis-ci.com <http://travis-ci.com> >>> <http://travis-ci.com> is the commercial >>>>>>> version. We currently use a shared resource pool managed by >>> ASK INFRA >>>>>>> team on travis-ci.org <http://travis-ci.org> >>> <http://travis-ci.org>, but we have no control >>>>>>> over it - we can't see how it's configured, how much >>> resources are >>>>>>> available, how resources are allocated among Apache projects, >>> etc. >>>>>>> The nice thing about having an account on travis-ci.com >>> <http://travis-ci.com> >>>>>>> <http://travis-ci.com> are: >>>>>>> >>>>>>> - relatively low cost with much better resource guarantee >>> than what >>>>>>> we currently have [1]: $249/month with 5 dedicated concurrency, >>>>>>> $489/month with 10 concurrency >>>>>>> - low maintenance work compared to using Jenkins >>>>>>> - (potentially) no migration cost according to Travis's doc [2] >>>>>>> (pending verification) >>>>>>> - full control over the build capacity/configuration compared to >>>>>>> using ASF INFRA's pool >>>>>>> >>>>>>> I'd be surprised if we as such a vibrant community cannot >>> find and >>>>>>> fund $249*12=$2988 a year in exchange for a much better >> developer >>>>>>> experience and much higher productivity. >>>>>>> >>>>>>> [1] https://travis-ci.com/plans >>>>>>> [2] >>>>>>> >>>>> >>> >> https://docs.travis-ci.com/user/migrate/open-source-repository-migration >>>>>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler >>> <ches...@apache.org <mailto:ches...@apache.org> >>>>>>> <mailto:ches...@apache.org <mailto:ches...@apache.org>>> wrote: >>>>>>> >>>>>>> So yes, the Jenkins job keeps pulling the state from >>> Travis until it >>>>>>> finishes. >>>>>>> >>>>>>> Note sure I'm comfortable with the idea of using Jenkins >>> workers >>>>>>> just to >>>>>>> idle for a several hours. >>>>>>> >>>>>>> On 29/06/2019 14:56, Jeff Zhang wrote: >>>>>>>> Here's what zeppelin community did, we make a python >>> script to >>>>>>> check the >>>>>>>> build status of pull request. >>>>>>>> Here's script: >>>>>>>> >>> https://github.com/apache/zeppelin/blob/master/travis_check.py >>>>>>>> >>>>>>>> And this is the script we used in Jenkins build job. >>>>>>>> >>>>>>>> if [ -f "travis_check.py" ]; then >>>>>>>> git log -n 1 >>>>>>>> STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull >>>>>>> request.*from.*" | sed >>>>>>>> 's/.*GitHub pull request <a >>>>>>>> href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1 >>> \2/g') >>>>>>>> AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g') >>>>>>>> PR=$(echo $STATUS | awk '{print $1}' | sed >>>>>>> 's/.*[/]\(.*\)$/\1/g') >>>>>>>> #COMMIT=$(git log -n 1 | grep "^Merge:" | awk >>> '{print $3}') >>>>>>>> #if [ -z $COMMIT ]; then >>>>>>>> # COMMIT=$(curl -s >>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR >>>>>>>> | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | >>> tr '\n' ' ' >>>>>>> | sed >>>>>>>> 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | >>> grep -v >>>>>>> "apache:" | >>>>>>>> sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g') >>>>>>>> #fi >>>>>>>> >>>>>>>> # get commit hash from PR >>>>>>>> COMMIT=$(curl -s >>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR | >>>>>>>> grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr >>> '\n' ' ' >>>>>>> | sed >>>>>>>> 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | >>> grep -v >>>>>>> "apache:" | >>>>>>>> sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g') >>>>>>>> sleep 30 # sleep few moment to wait travis starts >>> the build >>>>>>>> RET_CODE=0 >>>>>>>> python ./travis_check.py ${AUTHOR} ${COMMIT} || >>> RET_CODE=$? >>>>>>>> if [ $RET_CODE -eq 2 ]; then # try with repository >>> name when >>>>>>> travis-ci is >>>>>>>> not available in the account >>>>>>>> RET_CODE=0 >>>>>>>> AUTHOR=$(curl -s >>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR >>>>>>>> | grep '"full_name":' | grep -v "apache/zeppelin" | sed >>>>>>>> 's/.*[:][^"]*["]\([^/]*\).*/\1/g') >>>>>>>> python ./travis_check.py ${AUTHOR} ${COMMIT} || >>> RET_CODE=$? >>>>>>>> fi >>>>>>>> >>>>>>>> if [ $RET_CODE -eq 2 ]; then # fail with can't find >>> build >>>>>>> information in >>>>>>>> the travis >>>>>>>> set +x >>>>>>>> echo >>> "-----------------------------------------------------" >>>>>>>> echo "Looks like travis-ci is not configured for >>> your fork." >>>>>>>> echo "Please setup by swich on 'zeppelin' >>> repository at >>>>>>>> https://travis-ci.org/profile and travis-ci." >>>>>>>> echo "And then make sure 'Build branch updates' >>> option is >>>>>>> enabled in >>>>>>>> the settings >>> https://travis-ci.org/${AUTHOR}/zeppelin/settings >>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings> >>>>>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>." >>>>>>>> echo "" >>>>>>>> echo "To trigger CI after setup, you will need >>> ammend your >>>>>>> last commit >>>>>>>> with" >>>>>>>> echo "git commit --amend" >>>>>>>> echo "git push your-remote HEAD --force" >>>>>>>> echo "" >>>>>>>> echo "See >>>>>>>> >>>>>>> >>>>> >>> >> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration >>>>>>>> ." >>>>>>>> fi >>>>>>>> >>>>>>>> exit $RET_CODE >>>>>>>> else >>>>>>>> set +x >>>>>>>> echo "travis_check.py does not exists" >>>>>>>> exit 1 >>>>>>>> fi >>>>>>>> >>>>>>>> Chesnay Schepler <ches...@apache.org >>> <mailto:ches...@apache.org> >>>>>>> <mailto:ches...@apache.org <mailto:ches...@apache.org>>> >>> 于2019年6月29日周六 下午3:17写道: >>>>>>>> >>>>>>>>> Does this imply that a Jenkins job is active as long >>> as the >>>>>>> Travis build >>>>>>>>> runs? >>>>>>>>> >>>>>>>>> On 26/06/2019 21:28, Bowen Li wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> @Dawid, I think the "long test running" as I >>> mentioned in the >>>>>>> first >>>>>>>>> email, >>>>>>>>>> also as you guys said, belongs to "a big effort >>> which is much >>>>>>> harder to >>>>>>>>>> accomplish in a short period of time and may deserve >>> its own >>>>>>> separate >>>>>>>>>> discussion". Thus I didn't include it in what we can >>> do in a >>>>>>> foreseeable >>>>>>>>>> short term. >>>>>>>>>> >>>>>>>>>> Besides, I don't think that's the ultimate reason >>> for lack of >>>>>>> build >>>>>>>>>> resources. Even if the build is shortened to >>> something like >>>>>>> 2h, the >>>>>>>>>> problems of no build machine works about 6 or more >>> hours in >>>>>>> PST daytime >>>>>>>>>> that I described will still happen, because no >>> machine from >>>>>>> ASF INFRA's >>>>>>>>>> pool is allocated to Flink. As I have paid close >>> attention to >>>>>>> the build >>>>>>>>>> queue in the past few weekdays, it's a pretty clear >>> pattern now. >>>>>>>>>> >>>>>>>>>> **The ultimate root cause** for that is - we don't >>> have any >>>>>>> **dedicated** >>>>>>>>>> build resources that we can stably rely on. I'm >>> actually ok to >>>>>>> wait for a >>>>>>>>>> long time if there are build requests running, it >>> means at >>>>>>> least we are >>>>>>>>>> making progress. But I'm not ok with no build >>> resource. A >>>>>>> better place I >>>>>>>>>> think we should aim at in short term is to always >>> have at >>>>>>> least a central >>>>>>>>>> pool (can be 3 or 5) of machines dedicated to build >>> Flink at >>>>>>> any time, or >>>>>>>>>> maybe use users resources. >>>>>>>>>> >>>>>>>>>> @Chesnay @Robert I synced with Jeff offline that >>> Zeppelin >>>>>>> community is >>>>>>>>>> using a Jenkins job to automatically build on users' >>> travis >>>>>>> account and >>>>>>>>>> link the result back to github PR. I guess the >>> Jenkins job >>>>>>> would fetch >>>>>>>>>> latest upstream master and build the PR against it. >>> Jeff has >>>>>>> filed >>>>>>>>> tickets >>>>>>>>>> to learn and get access to the Jenkins infra. It'll >>> better to >>>>>>> fully >>>>>>>>>> understand it first before judging this approach. >>>>>>>>>> >>>>>>>>>> I also heard good things about CircleCI, and ASF >>> INFRA seems >>>>>>> to have a >>>>>>>>> pool >>>>>>>>>> of build capacity there too. Can be an alternative >>> to consider. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz < >>>>>>>>> dwysakow...@apache.org >>> <mailto:dwysakow...@apache.org> <mailto:dwysakow...@apache.org >>> <mailto:dwysakow...@apache.org>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Sorry to jump in late, but I think Bowen missed the >>> most >>>>>>> important point >>>>>>>>>>> from Chesnay's previous message in the summary. The >>> ultimate >>>>>>> reason for >>>>>>>>>>> all the problems is that the tests take close to 2 >>> hours to >>>>>>> run already. >>>>>>>>>>> I fully support this claim: "Unless people start >>> caring about >>>>>>> test times >>>>>>>>>>> before adding them, this issue cannot be solved" >>>>>>>>>>> >>>>>>>>>>> This is also another reason why using user's Travis >>> account >>>>>>> won't help. >>>>>>>>>>> Every few weeks we reach the user's time limit for >>> a single >>>>>>> profile. >>>>>>>>>>> This makes the user's builds simply fail, until we >>> either >>>>>>> properly >>>>>>>>>>> decrease the time the tests take (which I am not >>> sure we ever >>>>>>> did) or >>>>>>>>>>> postpone the problem by splitting into more >>> profiles. (Note >>>>>>> that the ASF >>>>>>>>>>> Travis account has higher time limits) >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> >>>>>>>>>>> Dawid >>>>>>>>>>> >>>>>>>>>>> On 26/06/2019 09:36, Robert Metzger wrote: >>>>>>>>>>>> Do we know if using "the best" available hardware >>> would >>>>>>> improve the >>>>>>>>> build >>>>>>>>>>>> times? >>>>>>>>>>>> Imagine we would run the build on machines with >>> plenty of >>>>>>> main memory >>>>>>>>> to >>>>>>>>>>>> mount everything to ramdisk + the latest CPU >>> architecture? >>>>>>>>>>>> >>>>>>>>>>>> Throwing hardware at the problem could help reduce >>> the time >>>>>>> of an >>>>>>>>>>>> individual build, and using our own infrastructure >>> would >>>>>>> remove our >>>>>>>>>>>> dependency on Apache's Travis account (with the >>> obvious >>>>>>> downside of >>>>>>>>>>> having >>>>>>>>>>>> to maintain the infrastructure) >>>>>>>>>>>> We could use an open source travis alternative, to >>> have a >>>>>>> similar >>>>>>>>>>>> experience and make the migration easy. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler >>>>>>> <ches...@apache.org <mailto:ches...@apache.org> >>> <mailto:ches...@apache.org <mailto:ches...@apache.org>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>> From what I gathered, there's no special >>> sauce that the >>>>>>> Zeppelin >>>>>>>>>>>>> project uses which actually integrates a users >> Travis >>>>>>> account into the >>>>>>>>>>> PR. >>>>>>>>>>>>> They just disabled Travis for PRs. And that's >>> kind of it. >>>>>>>>>>>>> >>>>>>>>>>>>> Naturally we can do this (duh) and safe the ASF a >>> fair >>>>>>> amount of >>>>>>>>>>>>> resources, but there are downsides: >>>>>>>>>>>>> >>>>>>>>>>>>> The discoverability of the Travis check takes a >>> nose-dive. >>>>>>> Either we >>>>>>>>>>>>> require every contributor to always, an every >>> commit, also >>>>>>> post a >>>>>>>>> Travis >>>>>>>>>>>>> build, or we have the reviewer sift through the >>>>>>> contributors account >>>>>>>>> to >>>>>>>>>>>>> find it. >>>>>>>>>>>>> >>>>>>>>>>>>> This is rather cumbersome. Additionally, it's >>> also not >>>>>>> equivalent to >>>>>>>>>>>>> having a PR build. >>>>>>>>>>>>> >>>>>>>>>>>>> A normal branch build takes a branch as is and >>> tests it. A >>>>>>> PR build >>>>>>>>>>>>> merges the branch into master, and then runs it. >>> (Fun fact: >>>>>>> This is >>>>>>>>> why >>>>>>>>>>>>> a PR without merge conflicts is not being run on >>> Travis.) >>>>>>>>>>>>> >>>>>>>>>>>>> And ultimately, everyone can already make use of >> this >>>>>>> approach anyway. >>>>>>>>>>>>> >>>>>>>>>>>>> On 25/06/2019 08:02, Jark Wu wrote: >>>>>>>>>>>>>> Hi Jeff, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for sharing the Zeppelin approach. I >>> think it's a >>>>>>> good idea to >>>>>>>>>>>>>> leverage user's travis account. >>>>>>>>>>>>>> In this way, we can have almost unlimited >>> concurrent build >>>>>>> jobs and >>>>>>>>>>>>>> developers can restart build by themselves >>> (currently only >>>>>>> committers >>>>>>>>>>>>>> can restart PR's build). >>>>>>>>>>>>>> >>>>>>>>>>>>>> But I'm still not very clear how to integrate >> user's >>>>>>> travis build >>>>>>>>> into >>>>>>>>>>>>>> the Flink pull request's build automatically. >>> Can you >>>>>>> explain more in >>>>>>>>>>>>>> detail? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Another question: does travis only build >>> branches for user >>>>>>> account? >>>>>>>>>>>>>> My concern is that builds for PRs will rebase >> user's >>>>>>> commits against >>>>>>>>>>>>>> current master branch. >>>>>>>>>>>>>> This will help us to find problems before >>> merge. Builds >>>>>>> for branches >>>>>>>>>>>>>> will lose the impact of new commits in master. >>>>>>>>>>>>>> How does Zeppelin solve this problem? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks again for sharing the idea. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Jark >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang >>> <zjf...@gmail.com <mailto:zjf...@gmail.com> >>>>>>> <mailto:zjf...@gmail.com <mailto:zjf...@gmail.com>> >>>>>>>>>>>>>> <mailto:zjf...@gmail.com >>> <mailto:zjf...@gmail.com> <mailto:zjf...@gmail.com >>> <mailto:zjf...@gmail.com>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Folks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Zeppelin meet this kind of issue before, we solve >>>>>>> it by >>>>>>>>> delegating >>>>>>>>>>>>>> each >>>>>>>>>>>>>> one's PR build to his travis account >>> (Everyone can >>>>>>> have 5 free >>>>>>>>>>>>>> slot for >>>>>>>>>>>>>> travis build). >>>>>>>>>>>>>> Apache account travis build is only triggered when >>>>>>> PR is merged. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Kurt Young <ykt...@gmail.com >>> <mailto:ykt...@gmail.com> >>>>>>> <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com>> >>> <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com> >>>>>>> <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com>>>> >>>>>>>>>>>>>> 于2019年6月25日周二 上午10:16写道: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> (Forgot to cc George) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> Kurt >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Jun 25, 2019 at 10:16 AM Kurt Young >>>>>>> <ykt...@gmail.com <mailto:ykt...@gmail.com> >>> <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com>> >>>>>>>>>>>>>> <mailto:ykt...@gmail.com >>> <mailto:ykt...@gmail.com> <mailto:ykt...@gmail.com >>> <mailto:ykt...@gmail.com>>>> >>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Bowen, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks for bringing this up. We >>> actually have >>>>>>> discussed >>>>>>>>> about >>>>>>>>>>>>>> this, and I >>>>>>>>>>>>>>>> think Till and George have >>>>>>>>>>>>>>>> already spend sometime investigating >>> it. I have >>>>>>> cced both of >>>>>>>>>>>>>> them, and >>>>>>>>>>>>>>>> maybe they can share >>>>>>>>>>>>>>>> their findings. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>> Kurt >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Jun 25, 2019 at 10:08 AM Jark Wu >>>>>>> <imj...@gmail.com <mailto:imj...@gmail.com> >>> <mailto:imj...@gmail.com <mailto:imj...@gmail.com>> >>>>>>>>>>>>>> <mailto:imj...@gmail.com >>> <mailto:imj...@gmail.com> <mailto:imj...@gmail.com >>> <mailto:imj...@gmail.com>>>> >>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Bowen, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks for bringing this. We also >>> suffered from >>>>>>> the long >>>>>>>>>>>>>> build time. >>>>>>>>>>>>>>>>> I agree that we should focus on >>> solving build >>>>>>> capacity >>>>>>>>>>>>>> problem in the >>>>>>>>>>>>>>>>> thread. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> My observation is there is only one >>> build is >>>>>>> running, all >>>>>>>>> the >>>>>>>>>>>>>> others >>>>>>>>>>>>>>>>> (other >>>>>>>>>>>>>>>>> PRs, master) are pending. >>>>>>>>>>>>>>>>> The pricing plan[1] of travis shows >>> it can >>>>>>> support >>>>>>>>> concurrent >>>>>>>>>>>>>> build >>>>>>>>>>>>>>> jobs. >>>>>>>>>>>>>>>>> But I don't know which plan we are >>> using, might >>>>>>> be the free >>>>>>>>>>>>>> plan for >>>>>>>>>>>>>>> open >>>>>>>>>>>>>>>>> source. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I cc-ed Chesnay who may have some >>> experience on >>>>>>> Travis. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>> Jark >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [1]: https://travis-ci.com/plans >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, 25 Jun 2019 at 08:11, Bowen Li < >>>>>>>>> bowenl...@gmail.com <mailto:bowenl...@gmail.com> >>> <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>> >>>>>>>>>>>>>> <mailto:bowenl...@gmail.com >>> <mailto:bowenl...@gmail.com> >>>>>>> <mailto:bowenl...@gmail.com >>> <mailto:bowenl...@gmail.com>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi Steven, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I think you may not read what I >>> wrote. The >>>>>>> discussion is >>>>>>>>>>> about >>>>>>>>>>>>>>> "unstable >>>>>>>>>>>>>>>>>> build **capacity**", in another word >>>>>>> "unstable / lack of >>>>>>>>>>> build >>>>>>>>>>>>>>>>> resources", >>>>>>>>>>>>>>>>>> not "unstable build". >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 4:40 PM >>> Steven Wu >>>>>>>>>>>>>> <stevenz...@gmail.com >>> <mailto:stevenz...@gmail.com> <mailto:stevenz...@gmail.com >>> <mailto:stevenz...@gmail.com>> >>>>>>> <mailto:stevenz...@gmail.com >>> <mailto:stevenz...@gmail.com> <mailto:stevenz...@gmail.com >>> <mailto:stevenz...@gmail.com>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> long and sometimes unstable build is >>>>>>> definitely a pain >>>>>>>>>>>>> point. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I suspect the build failure here in >>>>>>>>> flink-connector-kafka >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>>> related >>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> my change. but there is no easy >>> re-run the >>>>>>> build on >>>>>>>>>>>>>> travis UI. >>>>>>>>>>>>>>> Google >>>>>>>>>>>>>>>>>>> search showed a trick of >>> close-and-open the >>>>>>> PR will >>>>>>>>>>>>>> trigger rebuild. >>>>>>>>>>>>>>>>> but >>>>>>>>>>>>>>>>>>> that could add noises to the PR >>> activities. >>>>>>>>>>>>>>>>>>> >>>>>>> https://travis-ci.org/apache/flink/jobs/545555519 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> travis-ci for my personal repo >>> often failed >>>>>>> with >>>>>>>>>>>>>> exceeding time >>>>>>>>>>>>>>> limit >>>>>>>>>>>>>>>>>> after >>>>>>>>>>>>>>>>>>> 4+ hours. >>>>>>>>>>>>>>>>>>> The job exceeded the maximum time >>> limit for >>>>>>> jobs, and >>>>>>>>> has >>>>>>>>>>>>>> been >>>>>>>>>>>>>>>>>> terminated. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 4:15 PM >>> Bowen Li >>>>>>>>>>>>>> <bowenl...@gmail.com >>> <mailto:bowenl...@gmail.com> <mailto:bowenl...@gmail.com >>> <mailto:bowenl...@gmail.com>> >>>>>>> <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com> >>> <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>> https://travis-ci.org/apache/flink/builds/549681530 >>>>>>>>>>>>>> This build >>>>>>>>>>>>>>>>>> request >>>>>>>>>>>>>>>>>>>> has >>>>>>>>>>>>>>>>>>>> been sitting at **HEAD of the >>> queue** >>>>>>> since I first >>>>>>>>> saw >>>>>>>>>>>>>> it at PST >>>>>>>>>>>>>>>>>> 10:30am >>>>>>>>>>>>>>>>>>>> (not sure how long it's been >>> there before >>>>>>> 10:30am). >>>>>>>>>>>>>> It's PST >>>>>>>>>>>>>>> 4:12pm >>>>>>>>>>>>>>>>> now >>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>> it hasn't started yet. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 2:48 PM >>> Bowen Li >>>>>>>>>>>>>> <bowenl...@gmail.com >>> <mailto:bowenl...@gmail.com> <mailto:bowenl...@gmail.com >>> <mailto:bowenl...@gmail.com>> >>>>>>> <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com> >>> <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>>>> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hi devs, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I've been experiencing the pain >>>>>>> resulting from lack >>>>>>>>>>>>>> of stable >>>>>>>>>>>>>>>>> build >>>>>>>>>>>>>>>>>>>>> capacity on Travis for Flink >>> PRs [1]. >>>>>>>>> Specifically, I >>>>>>>>>>>>>> noticed >>>>>>>>>>>>>>>>> often >>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>> no >>>>>>>>>>>>>>>>>>>>> build in the queue is making any >>>>>>> progress for >>>>>>>>> hours, >>>>>>>>>>> and >>>>>>>>>>>>>>> suddenly >>>>>>>>>>>>>>>>> 5 >>>>>>>>>>>>>>>>>> or >>>>>>>>>>>>>>>>>>> 6 >>>>>>>>>>>>>>>>>>>>> builds kick off all together >>> after the >>>>>>> long pause. >>>>>>>>>>>>>> I'm at PST >>>>>>>>>>>>>>>>>> (UTC-08) >>>>>>>>>>>>>>>>>>>> time >>>>>>>>>>>>>>>>>>>>> zone, and I've seen pause can >>> be as >>>>>>> long as 6 hours >>>>>>>>>>>>>> from PST 9am >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>> 3pm >>>>>>>>>>>>>>>>>>>>> (let alone the time needed to >>> drain the >>>>>>> queue >>>>>>>>>>>>>> afterwards). >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I think this has greatly >>> impacted our >>>>>>> productivity. >>>>>>>>>>> I've >>>>>>>>>>>>>>>>> experienced >>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>> PRs submitted in the early >>> morning of >>>>>>> PST time zone >>>>>>>>>>>>>> won't finish >>>>>>>>>>>>>>>>>> their >>>>>>>>>>>>>>>>>>>>> build until late night of the >>> same day. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> So my questions are: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> - Has anyone else experienced >>> the same >>>>>>> problem or >>>>>>>>>>>>>> have similar >>>>>>>>>>>>>>>>>>>> observation >>>>>>>>>>>>>>>>>>>>> on TravisCI? (I suspect it >>> has things >>>>>>> to do with >>>>>>>>> time >>>>>>>>>>>>>> zone) >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> - What pricing plan of >>> TravisCI is >>>>>>> Flink currently >>>>>>>>>>>>>> using? Is it >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> free >>>>>>>>>>>>>>>>>>>>> plan for open source >>> projects? What >>>>>>> are the >>>>>>>>>>>>>> guaranteed build >>>>>>>>>>>>>>>>> capacity >>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>> the current plan? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> - If the current pricing plan >>> (either >>>>>>> free or paid) >>>>>>>>>>>>> can't >>>>>>>>>>>>>>> provide >>>>>>>>>>>>>>>>>>> stable >>>>>>>>>>>>>>>>>>>>> build capacity, can we >>> upgrade to a >>>>>>> higher priced >>>>>>>>>>>>>> plan with >>>>>>>>>>>>>>> larger >>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>> more >>>>>>>>>>>>>>>>>>>>> stable build capacity? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> BTW, another factor that >>> contribute to >>>>>>> the >>>>>>>>>>>>>> productivity problem >>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>> our build is slow - we run >>> full build >>>>>>> for every PR >>>>>>>>>>> and a >>>>>>>>>>>>>>>>> successful >>>>>>>>>>>>>>>>>>> full >>>>>>>>>>>>>>>>>>>>> build takes ~5h. We >>> definitely have >>>>>>> more options to >>>>>>>>>>>>>> solve it, >>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>> instance, >>>>>>>>>>>>>>>>>>>>> modularize the build graphs >>> and reuse >>>>>>> artifacts >>>>>>>>> from >>>>>>>>>>> the >>>>>>>>>>>>>>> previous >>>>>>>>>>>>>>>>>>> build. >>>>>>>>>>>>>>>>>>>>> But I think that can be a big >>> effort >>>>>>> which is much >>>>>>>>>>>>>> harder to >>>>>>>>>>>>>>>>>> accomplish >>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>> a short period of time and >>> may deserve >>>>>>> its own >>>>>>>>>>> separate >>>>>>>>>>>>>>>>> discussion. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>> https://travis-ci.org/apache/flink/pull_requests >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Best Regards >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jeff Zhang >>>>>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>> >> >>