+1 to move to a private Travis account. I can confirm that Ververica will sponsor a Travis CI plan that is equivalent or a bit higher than the previous ASF quota (10 concurrent build queues)
Best, Stephan On Thu, Jul 4, 2019 at 10:46 AM Chesnay Schepler <ches...@apache.org> wrote: > I've raised a JIRA > <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to inquire > whether it would be possible to switch to a different Travis account, > and if so what steps would need to be taken. > We need a proper confirmation from INFRA since we are not in full > control of the flink repository (for example, we cannot access the > settings page). > > If this is indeed possible, Ververica is willing sponsor a Travis > account for the Flink project. > This would provide us with more than enough resources than we need. > > Since this makes the project more reliant on resources provided by > external companies I would like to vote on this. > > Please vote on this proposal, as follows: > [ ] +1, Approve the migration to a Ververica-sponsored Travis account, > provided that INFRA approves > [ ] -1, Do not approach the migration to a Ververica-sponsored Travis > account > > The vote will be open for at least 24h, and until we have confirmation > from INFRA. The voting period may be shorter than the usual 3 days since > our current is effectively not working. > > On 04/07/2019 06:51, Bowen Li wrote: > > Re: > Are they using their own Travis CI pool, or did the switch to an > > entirely different CI service? > > > > I reached out to Wes and Krisztián from Apache Arrow PMC. They are > > currently moving away from ASF's Travis to their own in-house metal > > machines at [1] with custom CI application at [2]. They've seen > > significant improvement w.r.t both much higher performance and > > basically no resource waiting time, "night-and-day" difference quoting > > Wes. > > > > Re: > If we can just switch to our own Travis pool, just for our > > project, then this might be something we can do fairly quickly? > > > > I believe so, according to [3] and [4] > > > > > > [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/> > > [2] https://github.com/ursa-labs/ursabot > > [3] > > https://docs.travis-ci.com/user/migrate/open-source-repository-migration > > [4] https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com > > > > > > > > On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <ches...@apache.org > > <mailto:ches...@apache.org>> wrote: > > > > Are they using their own Travis CI pool, or did the switch to an > > entirely different CI service? > > > > If we can just switch to our own Travis pool, just for our > > project, then > > this might be something we can do fairly quickly? > > > > On 03/07/2019 05:55, Bowen Li wrote: > > > I responded in the INFRA ticket [1] that I believe they are > > using a wrong > > > metric against Flink and the total build time is a completely > > different > > > thing than guaranteed build capacity. > > > > > > My response: > > > > > > "As mentioned above, since I started to pay attention to Flink's > > build > > > queue a few tens of days ago, I'm in Seattle and I saw no build > > was kicking > > > off in PST daytime in weekdays for Flink. Our teammates in China > > and Europe > > > have also reported similar observations. So we need to evaluate > > how the > > > large total build time came from - if 1) your number and 2) our > > > observations from three locations that cover pretty much a full > > day, are > > > all true, I **guess** one reason can be that - highly likely the > > extra > > > build time came from weekends when other Apache projects may be > > idle and > > > Flink just drains hard its congested queue. > > > > > > Please be aware of that we're not complaining about the lack of > > resources > > > in general, I'm complaining about the lack of **stable, dedicated** > > > resources. An example for the latter one is, currently even if > > no build is > > > in Flink's queue and I submit a request to be the queue head in PST > > > morning, my build won't even start in 6-8+h. That is an absurd > > amount of > > > waiting time. > > > > > > That's saying, if ASF INFRA decides to adopt a quota system and > > grants > > > Flink five DEDICATED servers that runs all the time only for > > Flink, that'll > > > be PERFECT and can totally solve our problem now. > > > > > > Please be aware of that we're not complaining about the lack of > > resources > > > in general, I'm complaining about the lack of **stable, dedicated** > > > resources. An example for the latter one is, currently even if > > no build is > > > in Flink's queue and I submit a request to be the queue head in PST > > > morning, my build won't even start in 6-8+h. That is an absurd > > amount of > > > waiting time. > > > > > > > > > That's saying, if ASF INFRA decides to adopt a quota system and > > grants > > > Flink five DEDICATED servers that runs all the time only for > > Flink, that'll > > > be PERFECT and can totally solve our problem now. > > > > > > I feel what's missing in the ASF INFRA's Travis resource pool is > > some level > > > of build capacity SLAs and certainty" > > > > > > > > > Again, I believe there are differences in nature of these two > > problems, > > > long build time v.s. lack of dedicated build resource. That's > > saying, > > > shortening build time may relieve the situation, and may not. > > I'm sightly > > > negative on disabling IT cases for PRs, due to the downside is > > that we are > > > at risk of any potential bugs in PR that UTs doesn't catch, and > > may cost a > > > lot more to fix and if it slows others down or even block > > others, but am > > > open to others opinions on it. > > > > > > AFAICT from INFRA ticket[1], donating to ASF INFRA won't be > > feasible to > > > solve our problem since INFRA's pool is fully shared and they > > have no > > > control and finer insights over resource allocation to a > > specific Apache > > > project. As mentioned in [1], Apache Arrow is moving away from > > ASF INFRA > > > Travis pool (they are actually surprised Flink hasn't plan to do > > so). I > > > know that Spark is on its own build infra. If we all agree that > > funding our > > > own build infra, I'd be glad to help investigate any potential > > options > > > after releasing 1.9 since I'm super busy with 1.9 now. > > > > > > [1] https://issues.apache.org/jira/browse/INFRA-18533 > > > > > > > > > > > > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler > > <ches...@apache.org <mailto:ches...@apache.org>> wrote: > > > > > >> As a short-term stopgap, since we can assume this issue to > > become much > > >> worse in the following days/weeks, we could disable IT cases in > > PRs and > > >> only run them on master. > > >> > > >> On 02/07/2019 12:03, Chesnay Schepler wrote: > > >>> People really have to stop thinking that just because > > something works > > >>> for us it is also a good solution. > > >>> Also, please remember that our builds run for 2h from start to > > finish, > > >>> and not the 14 _minutes_ it takes for zeppelin. > > >>> We are dealing with an entirely different scale here, both in > > terms of > > >>> build times and number of builds. > > >>> > > >>> In this very thread people have been complaining about long queue > > >>> times for their builds. Surprise, other Apache projects have been > > >>> suffering the very same thing due to us not controlling our build > > >>> times. While switching services (be it Jenkins, CircleCI or > > whatever) > > >>> will possibly work for us (and these options are actually > > attractive, > > >>> like CircleCI's proper support for build artifacts), it will also > > >>> result in us likely negatively affecting other projects in > > significant > > >>> ways. > > >>> > > >>> Sure, the Jenkins setup has a good user experience for us, at > > the cost > > >>> of blocking Jenkins workers for a _lot_ of time. Right now we > > have 25 > > >>> PR's in our queue; that's possibly 50h we'd consume of Jenkins > > >>> resources, and the European contributors haven't even really > > started yet. > > >>> > > >>> FYI, the latest INFRA response from INFRA-18533: > > >>> > > >>> "Our rough metrics shows that Flink used over 5800 hours of > > build time > > >>> last month. That is equal to EIGHT servers running 24/7 for > > the ENTIRE > > >>> MONTH. EIGHT. nonstop. > > >>> When we discovered this last night, we discussed it some and > > are going > > >>> to tune down Flink to allow only five executors maximum. We > cannot > > >>> allow Flink to consume so much of a Foundation shared resource." > > >>> > > >>> So yes, we either > > >>> a) have to heavily reduce our CI usage or > > >>> b) fund our own, either maintaining it ourselves or donating > > to Apache. > > >>> > > >>> On 02/07/2019 05:11, Bowen Li wrote: > > >>>> By looking at the git history of the Jenkins script, its core > > part > > >>>> was finished in March 2017 (and only two minor update in > > 2017/2018), > > >>>> so it's been running for over two years now and feels like > > Zepplin > > >>>> community has been quite happy with it. @Jeff Zhang > > >>>> <mailto:zjf...@gmail.com <mailto:zjf...@gmail.com>> can you > > share your insights and user > > >>>> experience with the Jenkins+Travis approach? > > >>>> > > >>>> Things like: > > >>>> > > >>>> - has the approach completely solved the resource capacity > > problem > > >>>> for Zepplin community? is Zepplin community happy with the > > result? > > >>>> - is the whole configuration chain stable (e.g. uptime) enough? > > >>>> - how often do you need to maintain the Jenkins infra? how many > > >>>> people are usually involved in maintenance and bug-fixes? > > >>>> > > >>>> The downside of this approach seems mostly to be on the > > maintenance > > >>>> to me - maintain the script and Jenkins infra. > > >>>> > > >>>> ** Having Our Own Travis-CI.com Account ** > > >>>> > > >>>> Another alternative I've been thinking of is to have our own > > >>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com> > > account with paid dedicated > > >>>> resources. Note travis-ci.org <http://travis-ci.org> > > <http://travis-ci.org> is the free > > >>>> version and travis-ci.com <http://travis-ci.com> > > <http://travis-ci.com> is the commercial > > >>>> version. We currently use a shared resource pool managed by > > ASK INFRA > > >>>> team on travis-ci.org <http://travis-ci.org> > > <http://travis-ci.org>, but we have no control > > >>>> over it - we can't see how it's configured, how much > > resources are > > >>>> available, how resources are allocated among Apache projects, > > etc. > > >>>> The nice thing about having an account on travis-ci.com > > <http://travis-ci.com> > > >>>> <http://travis-ci.com> are: > > >>>> > > >>>> - relatively low cost with much better resource guarantee > > than what > > >>>> we currently have [1]: $249/month with 5 dedicated concurrency, > > >>>> $489/month with 10 concurrency > > >>>> - low maintenance work compared to using Jenkins > > >>>> - (potentially) no migration cost according to Travis's doc [2] > > >>>> (pending verification) > > >>>> - full control over the build capacity/configuration compared to > > >>>> using ASF INFRA's pool > > >>>> > > >>>> I'd be surprised if we as such a vibrant community cannot > > find and > > >>>> fund $249*12=$2988 a year in exchange for a much better > developer > > >>>> experience and much higher productivity. > > >>>> > > >>>> [1] https://travis-ci.com/plans > > >>>> [2] > > >>>> > > >> > > > https://docs.travis-ci.com/user/migrate/open-source-repository-migration > > >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler > > <ches...@apache.org <mailto:ches...@apache.org> > > >>>> <mailto:ches...@apache.org <mailto:ches...@apache.org>>> wrote: > > >>>> > > >>>> So yes, the Jenkins job keeps pulling the state from > > Travis until it > > >>>> finishes. > > >>>> > > >>>> Note sure I'm comfortable with the idea of using Jenkins > > workers > > >>>> just to > > >>>> idle for a several hours. > > >>>> > > >>>> On 29/06/2019 14:56, Jeff Zhang wrote: > > >>>> > Here's what zeppelin community did, we make a python > > script to > > >>>> check the > > >>>> > build status of pull request. > > >>>> > Here's script: > > >>>> > > > https://github.com/apache/zeppelin/blob/master/travis_check.py > > >>>> > > > >>>> > And this is the script we used in Jenkins build job. > > >>>> > > > >>>> > if [ -f "travis_check.py" ]; then > > >>>> > git log -n 1 > > >>>> > STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull > > >>>> request.*from.*" | sed > > >>>> > 's/.*GitHub pull request <a > > >>>> > href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1 > > \2/g') > > >>>> > AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g') > > >>>> > PR=$(echo $STATUS | awk '{print $1}' | sed > > >>>> 's/.*[/]\(.*\)$/\1/g') > > >>>> > #COMMIT=$(git log -n 1 | grep "^Merge:" | awk > > '{print $3}') > > >>>> > #if [ -z $COMMIT ]; then > > >>>> > # COMMIT=$(curl -s > > >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR > > >>>> > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | > > tr '\n' ' ' > > >>>> | sed > > >>>> > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | > > grep -v > > >>>> "apache:" | > > >>>> > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g') > > >>>> > #fi > > >>>> > > > >>>> > # get commit hash from PR > > >>>> > COMMIT=$(curl -s > > >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR | > > >>>> > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr > > '\n' ' ' > > >>>> | sed > > >>>> > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | > > grep -v > > >>>> "apache:" | > > >>>> > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g') > > >>>> > sleep 30 # sleep few moment to wait travis starts > > the build > > >>>> > RET_CODE=0 > > >>>> > python ./travis_check.py ${AUTHOR} ${COMMIT} || > > RET_CODE=$? > > >>>> > if [ $RET_CODE -eq 2 ]; then # try with repository > > name when > > >>>> travis-ci is > > >>>> > not available in the account > > >>>> > RET_CODE=0 > > >>>> > AUTHOR=$(curl -s > > >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR > > >>>> > | grep '"full_name":' | grep -v "apache/zeppelin" | sed > > >>>> > 's/.*[:][^"]*["]\([^/]*\).*/\1/g') > > >>>> > python ./travis_check.py ${AUTHOR} ${COMMIT} || > > RET_CODE=$? > > >>>> > fi > > >>>> > > > >>>> > if [ $RET_CODE -eq 2 ]; then # fail with can't find > > build > > >>>> information in > > >>>> > the travis > > >>>> > set +x > > >>>> > echo > > "-----------------------------------------------------" > > >>>> > echo "Looks like travis-ci is not configured for > > your fork." > > >>>> > echo "Please setup by swich on 'zeppelin' > > repository at > > >>>> > https://travis-ci.org/profile and travis-ci." > > >>>> > echo "And then make sure 'Build branch updates' > > option is > > >>>> enabled in > > >>>> > the settings > > https://travis-ci.org/${AUTHOR}/zeppelin/settings > > <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings> > > >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>." > > >>>> > echo "" > > >>>> > echo "To trigger CI after setup, you will need > > ammend your > > >>>> last commit > > >>>> > with" > > >>>> > echo "git commit --amend" > > >>>> > echo "git push your-remote HEAD --force" > > >>>> > echo "" > > >>>> > echo "See > > >>>> > > > >>>> > > >> > > > http://zeppelin.apache.org/contribution/contributions.html#continuous-integration > > >>>> > ." > > >>>> > fi > > >>>> > > > >>>> > exit $RET_CODE > > >>>> > else > > >>>> > set +x > > >>>> > echo "travis_check.py does not exists" > > >>>> > exit 1 > > >>>> > fi > > >>>> > > > >>>> > Chesnay Schepler <ches...@apache.org > > <mailto:ches...@apache.org> > > >>>> <mailto:ches...@apache.org <mailto:ches...@apache.org>>> > > 于2019年6月29日周六 下午3:17写道: > > >>>> > > > >>>> >> Does this imply that a Jenkins job is active as long > > as the > > >>>> Travis build > > >>>> >> runs? > > >>>> >> > > >>>> >> On 26/06/2019 21:28, Bowen Li wrote: > > >>>> >>> Hi, > > >>>> >>> > > >>>> >>> @Dawid, I think the "long test running" as I > > mentioned in the > > >>>> first > > >>>> >> email, > > >>>> >>> also as you guys said, belongs to "a big effort > > which is much > > >>>> harder to > > >>>> >>> accomplish in a short period of time and may deserve > > its own > > >>>> separate > > >>>> >>> discussion". Thus I didn't include it in what we can > > do in a > > >>>> foreseeable > > >>>> >>> short term. > > >>>> >>> > > >>>> >>> Besides, I don't think that's the ultimate reason > > for lack of > > >>>> build > > >>>> >>> resources. Even if the build is shortened to > > something like > > >>>> 2h, the > > >>>> >>> problems of no build machine works about 6 or more > > hours in > > >>>> PST daytime > > >>>> >>> that I described will still happen, because no > > machine from > > >>>> ASF INFRA's > > >>>> >>> pool is allocated to Flink. As I have paid close > > attention to > > >>>> the build > > >>>> >>> queue in the past few weekdays, it's a pretty clear > > pattern now. > > >>>> >>> > > >>>> >>> **The ultimate root cause** for that is - we don't > > have any > > >>>> **dedicated** > > >>>> >>> build resources that we can stably rely on. I'm > > actually ok to > > >>>> wait for a > > >>>> >>> long time if there are build requests running, it > > means at > > >>>> least we are > > >>>> >>> making progress. But I'm not ok with no build > > resource. A > > >>>> better place I > > >>>> >>> think we should aim at in short term is to always > > have at > > >>>> least a central > > >>>> >>> pool (can be 3 or 5) of machines dedicated to build > > Flink at > > >>>> any time, or > > >>>> >>> maybe use users resources. > > >>>> >>> > > >>>> >>> @Chesnay @Robert I synced with Jeff offline that > > Zeppelin > > >>>> community is > > >>>> >>> using a Jenkins job to automatically build on users' > > travis > > >>>> account and > > >>>> >>> link the result back to github PR. I guess the > > Jenkins job > > >>>> would fetch > > >>>> >>> latest upstream master and build the PR against it. > > Jeff has > > >>>> filed > > >>>> >> tickets > > >>>> >>> to learn and get access to the Jenkins infra. It'll > > better to > > >>>> fully > > >>>> >>> understand it first before judging this approach. > > >>>> >>> > > >>>> >>> I also heard good things about CircleCI, and ASF > > INFRA seems > > >>>> to have a > > >>>> >> pool > > >>>> >>> of build capacity there too. Can be an alternative > > to consider. > > >>>> >>> > > >>>> >>> > > >>>> >>> > > >>>> >>> > > >>>> >>> > > >>>> >>> > > >>>> >>> > > >>>> >>> > > >>>> >>> > > >>>> >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz < > > >>>> >> dwysakow...@apache.org > > <mailto:dwysakow...@apache.org> <mailto:dwysakow...@apache.org > > <mailto:dwysakow...@apache.org>>> > > >>>> >>> wrote: > > >>>> >>> > > >>>> >>>> Sorry to jump in late, but I think Bowen missed the > > most > > >>>> important point > > >>>> >>>> from Chesnay's previous message in the summary. The > > ultimate > > >>>> reason for > > >>>> >>>> all the problems is that the tests take close to 2 > > hours to > > >>>> run already. > > >>>> >>>> I fully support this claim: "Unless people start > > caring about > > >>>> test times > > >>>> >>>> before adding them, this issue cannot be solved" > > >>>> >>>> > > >>>> >>>> This is also another reason why using user's Travis > > account > > >>>> won't help. > > >>>> >>>> Every few weeks we reach the user's time limit for > > a single > > >>>> profile. > > >>>> >>>> This makes the user's builds simply fail, until we > > either > > >>>> properly > > >>>> >>>> decrease the time the tests take (which I am not > > sure we ever > > >>>> did) or > > >>>> >>>> postpone the problem by splitting into more > > profiles. (Note > > >>>> that the ASF > > >>>> >>>> Travis account has higher time limits) > > >>>> >>>> > > >>>> >>>> Best, > > >>>> >>>> > > >>>> >>>> Dawid > > >>>> >>>> > > >>>> >>>> On 26/06/2019 09:36, Robert Metzger wrote: > > >>>> >>>>> Do we know if using "the best" available hardware > > would > > >>>> improve the > > >>>> >> build > > >>>> >>>>> times? > > >>>> >>>>> Imagine we would run the build on machines with > > plenty of > > >>>> main memory > > >>>> >> to > > >>>> >>>>> mount everything to ramdisk + the latest CPU > > architecture? > > >>>> >>>>> > > >>>> >>>>> Throwing hardware at the problem could help reduce > > the time > > >>>> of an > > >>>> >>>>> individual build, and using our own infrastructure > > would > > >>>> remove our > > >>>> >>>>> dependency on Apache's Travis account (with the > > obvious > > >>>> downside of > > >>>> >>>> having > > >>>> >>>>> to maintain the infrastructure) > > >>>> >>>>> We could use an open source travis alternative, to > > have a > > >>>> similar > > >>>> >>>>> experience and make the migration easy. > > >>>> >>>>> > > >>>> >>>>> > > >>>> >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler > > >>>> <ches...@apache.org <mailto:ches...@apache.org> > > <mailto:ches...@apache.org <mailto:ches...@apache.org>>> > > >>>> >>>> wrote: > > >>>> >>>>>> >From what I gathered, there's no special > > sauce that the > > >>>> Zeppelin > > >>>> >>>>>> project uses which actually integrates a users > Travis > > >>>> account into the > > >>>> >>>> PR. > > >>>> >>>>>> They just disabled Travis for PRs. And that's > > kind of it. > > >>>> >>>>>> > > >>>> >>>>>> Naturally we can do this (duh) and safe the ASF a > > fair > > >>>> amount of > > >>>> >>>>>> resources, but there are downsides: > > >>>> >>>>>> > > >>>> >>>>>> The discoverability of the Travis check takes a > > nose-dive. > > >>>> Either we > > >>>> >>>>>> require every contributor to always, an every > > commit, also > > >>>> post a > > >>>> >> Travis > > >>>> >>>>>> build, or we have the reviewer sift through the > > >>>> contributors account > > >>>> >> to > > >>>> >>>>>> find it. > > >>>> >>>>>> > > >>>> >>>>>> This is rather cumbersome. Additionally, it's > > also not > > >>>> equivalent to > > >>>> >>>>>> having a PR build. > > >>>> >>>>>> > > >>>> >>>>>> A normal branch build takes a branch as is and > > tests it. A > > >>>> PR build > > >>>> >>>>>> merges the branch into master, and then runs it. > > (Fun fact: > > >>>> This is > > >>>> >> why > > >>>> >>>>>> a PR without merge conflicts is not being run on > > Travis.) > > >>>> >>>>>> > > >>>> >>>>>> And ultimately, everyone can already make use of > this > > >>>> approach anyway. > > >>>> >>>>>> > > >>>> >>>>>> On 25/06/2019 08:02, Jark Wu wrote: > > >>>> >>>>>>> Hi Jeff, > > >>>> >>>>>>> > > >>>> >>>>>>> Thanks for sharing the Zeppelin approach. I > > think it's a > > >>>> good idea to > > >>>> >>>>>>> leverage user's travis account. > > >>>> >>>>>>> In this way, we can have almost unlimited > > concurrent build > > >>>> jobs and > > >>>> >>>>>>> developers can restart build by themselves > > (currently only > > >>>> committers > > >>>> >>>>>>> can restart PR's build). > > >>>> >>>>>>> > > >>>> >>>>>>> But I'm still not very clear how to integrate > user's > > >>>> travis build > > >>>> >> into > > >>>> >>>>>>> the Flink pull request's build automatically. > > Can you > > >>>> explain more in > > >>>> >>>>>>> detail? > > >>>> >>>>>>> > > >>>> >>>>>>> Another question: does travis only build > > branches for user > > >>>> account? > > >>>> >>>>>>> My concern is that builds for PRs will rebase > user's > > >>>> commits against > > >>>> >>>>>>> current master branch. > > >>>> >>>>>>> This will help us to find problems before > > merge. Builds > > >>>> for branches > > >>>> >>>>>>> will lose the impact of new commits in master. > > >>>> >>>>>>> How does Zeppelin solve this problem? > > >>>> >>>>>>> > > >>>> >>>>>>> Thanks again for sharing the idea. > > >>>> >>>>>>> > > >>>> >>>>>>> Regards, > > >>>> >>>>>>> Jark > > >>>> >>>>>>> > > >>>> >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang > > <zjf...@gmail.com <mailto:zjf...@gmail.com> > > >>>> <mailto:zjf...@gmail.com <mailto:zjf...@gmail.com>> > > >>>> >>>>>>> <mailto:zjf...@gmail.com > > <mailto:zjf...@gmail.com> <mailto:zjf...@gmail.com > > <mailto:zjf...@gmail.com>>>> wrote: > > >>>> >>>>>>> > > >>>> >>>>>>> Hi Folks, > > >>>> >>>>>>> > > >>>> >>>>>>> Zeppelin meet this kind of issue before, we solve > > >>>> it by > > >>>> >> delegating > > >>>> >>>>>>> each > > >>>> >>>>>>> one's PR build to his travis account > > (Everyone can > > >>>> have 5 free > > >>>> >>>>>>> slot for > > >>>> >>>>>>> travis build). > > >>>> >>>>>>> Apache account travis build is only triggered when > > >>>> PR is merged. > > >>>> >>>>>>> > > >>>> >>>>>>> > > >>>> >>>>>>> > > >>>> >>>>>>> Kurt Young <ykt...@gmail.com > > <mailto:ykt...@gmail.com> > > >>>> <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com>> > > <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com> > > >>>> <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com>>>> > > >>>> >>>>>>> 于2019年6月25日周二 上午10:16写道: > > >>>> >>>>>>> > > >>>> >>>>>>> > (Forgot to cc George) > > >>>> >>>>>>> > > > >>>> >>>>>>> > Best, > > >>>> >>>>>>> > Kurt > > >>>> >>>>>>> > > > >>>> >>>>>>> > > > >>>> >>>>>>> > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young > > >>>> <ykt...@gmail.com <mailto:ykt...@gmail.com> > > <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com>> > > >>>> >>>>>>> <mailto:ykt...@gmail.com > > <mailto:ykt...@gmail.com> <mailto:ykt...@gmail.com > > <mailto:ykt...@gmail.com>>>> > > >>>> wrote: > > >>>> >>>>>>> > > > >>>> >>>>>>> > > Hi Bowen, > > >>>> >>>>>>> > > > > >>>> >>>>>>> > > Thanks for bringing this up. We > > actually have > > >>>> discussed > > >>>> >> about > > >>>> >>>>>>> this, and I > > >>>> >>>>>>> > > think Till and George have > > >>>> >>>>>>> > > already spend sometime investigating > > it. I have > > >>>> cced both of > > >>>> >>>>>>> them, and > > >>>> >>>>>>> > > maybe they can share > > >>>> >>>>>>> > > their findings. > > >>>> >>>>>>> > > > > >>>> >>>>>>> > > Best, > > >>>> >>>>>>> > > Kurt > > >>>> >>>>>>> > > > > >>>> >>>>>>> > > > > >>>> >>>>>>> > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu > > >>>> <imj...@gmail.com <mailto:imj...@gmail.com> > > <mailto:imj...@gmail.com <mailto:imj...@gmail.com>> > > >>>> >>>>>>> <mailto:imj...@gmail.com > > <mailto:imj...@gmail.com> <mailto:imj...@gmail.com > > <mailto:imj...@gmail.com>>>> > > >>>> wrote: > > >>>> >>>>>>> > > > > >>>> >>>>>>> > >> Hi Bowen, > > >>>> >>>>>>> > >> > > >>>> >>>>>>> > >> Thanks for bringing this. We also > > suffered from > > >>>> the long > > >>>> >>>>>>> build time. > > >>>> >>>>>>> > >> I agree that we should focus on > > solving build > > >>>> capacity > > >>>> >>>>>>> problem in the > > >>>> >>>>>>> > >> thread. > > >>>> >>>>>>> > >> > > >>>> >>>>>>> > >> My observation is there is only one > > build is > > >>>> running, all > > >>>> >> the > > >>>> >>>>>>> others > > >>>> >>>>>>> > >> (other > > >>>> >>>>>>> > >> PRs, master) are pending. > > >>>> >>>>>>> > >> The pricing plan[1] of travis shows > > it can > > >>>> support > > >>>> >> concurrent > > >>>> >>>>>>> build > > >>>> >>>>>>> > jobs. > > >>>> >>>>>>> > >> But I don't know which plan we are > > using, might > > >>>> be the free > > >>>> >>>>>>> plan for > > >>>> >>>>>>> > open > > >>>> >>>>>>> > >> source. > > >>>> >>>>>>> > >> > > >>>> >>>>>>> > >> I cc-ed Chesnay who may have some > > experience on > > >>>> Travis. > > >>>> >>>>>>> > >> > > >>>> >>>>>>> > >> Regards, > > >>>> >>>>>>> > >> Jark > > >>>> >>>>>>> > >> > > >>>> >>>>>>> > >> [1]: https://travis-ci.com/plans > > >>>> >>>>>>> > >> > > >>>> >>>>>>> > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li < > > >>>> >> bowenl...@gmail.com <mailto:bowenl...@gmail.com> > > <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>> > > >>>> >>>>>>> <mailto:bowenl...@gmail.com > > <mailto:bowenl...@gmail.com> > > >>>> <mailto:bowenl...@gmail.com > > <mailto:bowenl...@gmail.com>>>> wrote: > > >>>> >>>>>>> > >> > > >>>> >>>>>>> > >> > Hi Steven, > > >>>> >>>>>>> > >> > > > >>>> >>>>>>> > >> > I think you may not read what I > > wrote. The > > >>>> discussion is > > >>>> >>>> about > > >>>> >>>>>>> > "unstable > > >>>> >>>>>>> > >> > build **capacity**", in another word > > >>>> "unstable / lack of > > >>>> >>>> build > > >>>> >>>>>>> > >> resources", > > >>>> >>>>>>> > >> > not "unstable build". > > >>>> >>>>>>> > >> > > > >>>> >>>>>>> > >> > On Mon, Jun 24, 2019 at 4:40 PM > > Steven Wu > > >>>> >>>>>>> <stevenz...@gmail.com > > <mailto:stevenz...@gmail.com> <mailto:stevenz...@gmail.com > > <mailto:stevenz...@gmail.com>> > > >>>> <mailto:stevenz...@gmail.com > > <mailto:stevenz...@gmail.com> <mailto:stevenz...@gmail.com > > <mailto:stevenz...@gmail.com>>>> > > >>>> >>>>>>> > wrote: > > >>>> >>>>>>> > >> > > > >>>> >>>>>>> > >> > > long and sometimes unstable build is > > >>>> definitely a pain > > >>>> >>>>>> point. > > >>>> >>>>>>> > >> > > > > >>>> >>>>>>> > >> > > I suspect the build failure here in > > >>>> >> flink-connector-kafka > > >>>> >>>>>>> is not > > >>>> >>>>>>> > >> related > > >>>> >>>>>>> > >> > to > > >>>> >>>>>>> > >> > > my change. but there is no easy > > re-run the > > >>>> build on > > >>>> >>>>>>> travis UI. > > >>>> >>>>>>> > Google > > >>>> >>>>>>> > >> > > search showed a trick of > > close-and-open the > > >>>> PR will > > >>>> >>>>>>> trigger rebuild. > > >>>> >>>>>>> > >> but > > >>>> >>>>>>> > >> > > that could add noises to the PR > > activities. > > >>>> >>>>>>> > >> > > > > >>>> https://travis-ci.org/apache/flink/jobs/545555519 > > >>>> >>>>>>> > >> > > > > >>>> >>>>>>> > >> > > travis-ci for my personal repo > > often failed > > >>>> with > > >>>> >>>>>>> exceeding time > > >>>> >>>>>>> > limit > > >>>> >>>>>>> > >> > after > > >>>> >>>>>>> > >> > > 4+ hours. > > >>>> >>>>>>> > >> > > The job exceeded the maximum time > > limit for > > >>>> jobs, and > > >>>> >> has > > >>>> >>>>>>> been > > >>>> >>>>>>> > >> > terminated. > > >>>> >>>>>>> > >> > > > > >>>> >>>>>>> > >> > > On Mon, Jun 24, 2019 at 4:15 PM > > Bowen Li > > >>>> >>>>>>> <bowenl...@gmail.com > > <mailto:bowenl...@gmail.com> <mailto:bowenl...@gmail.com > > <mailto:bowenl...@gmail.com>> > > >>>> <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com> > > <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>>>> > > >>>> >>>>>>> > wrote: > > >>>> >>>>>>> > >> > > > > >>>> >>>>>>> > >> > > > > > >>>> https://travis-ci.org/apache/flink/builds/549681530 > > >>>> >>>>>>> This build > > >>>> >>>>>>> > >> > request > > >>>> >>>>>>> > >> > > > has > > >>>> >>>>>>> > >> > > > been sitting at **HEAD of the > > queue** > > >>>> since I first > > >>>> >> saw > > >>>> >>>>>>> it at PST > > >>>> >>>>>>> > >> > 10:30am > > >>>> >>>>>>> > >> > > > (not sure how long it's been > > there before > > >>>> 10:30am). > > >>>> >>>>>>> It's PST > > >>>> >>>>>>> > 4:12pm > > >>>> >>>>>>> > >> now > > >>>> >>>>>>> > >> > > and > > >>>> >>>>>>> > >> > > > it hasn't started yet. > > >>>> >>>>>>> > >> > > > > > >>>> >>>>>>> > >> > > > On Mon, Jun 24, 2019 at 2:48 PM > > Bowen Li > > >>>> >>>>>>> <bowenl...@gmail.com > > <mailto:bowenl...@gmail.com> <mailto:bowenl...@gmail.com > > <mailto:bowenl...@gmail.com>> > > >>>> <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com> > > <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>>>> > > >>>> >>>>>>> > >> wrote: > > >>>> >>>>>>> > >> > > > > > >>>> >>>>>>> > >> > > > > Hi devs, > > >>>> >>>>>>> > >> > > > > > > >>>> >>>>>>> > >> > > > > I've been experiencing the pain > > >>>> resulting from lack > > >>>> >>>>>>> of stable > > >>>> >>>>>>> > >> build > > >>>> >>>>>>> > >> > > > > capacity on Travis for Flink > > PRs [1]. > > >>>> >> Specifically, I > > >>>> >>>>>>> noticed > > >>>> >>>>>>> > >> often > > >>>> >>>>>>> > >> > > that > > >>>> >>>>>>> > >> > > > no > > >>>> >>>>>>> > >> > > > > build in the queue is making any > > >>>> progress for > > >>>> >> hours, > > >>>> >>>> and > > >>>> >>>>>>> > suddenly > > >>>> >>>>>>> > >> 5 > > >>>> >>>>>>> > >> > or > > >>>> >>>>>>> > >> > > 6 > > >>>> >>>>>>> > >> > > > > builds kick off all together > > after the > > >>>> long pause. > > >>>> >>>>>>> I'm at PST > > >>>> >>>>>>> > >> > (UTC-08) > > >>>> >>>>>>> > >> > > > time > > >>>> >>>>>>> > >> > > > > zone, and I've seen pause can > > be as > > >>>> long as 6 hours > > >>>> >>>>>>> from PST 9am > > >>>> >>>>>>> > >> to > > >>>> >>>>>>> > >> > 3pm > > >>>> >>>>>>> > >> > > > > (let alone the time needed to > > drain the > > >>>> queue > > >>>> >>>>>>> afterwards). > > >>>> >>>>>>> > >> > > > > > > >>>> >>>>>>> > >> > > > > I think this has greatly > > impacted our > > >>>> productivity. > > >>>> >>>> I've > > >>>> >>>>>>> > >> experienced > > >>>> >>>>>>> > >> > > that > > >>>> >>>>>>> > >> > > > > PRs submitted in the early > > morning of > > >>>> PST time zone > > >>>> >>>>>>> won't finish > > >>>> >>>>>>> > >> > their > > >>>> >>>>>>> > >> > > > > build until late night of the > > same day. > > >>>> >>>>>>> > >> > > > > > > >>>> >>>>>>> > >> > > > > So my questions are: > > >>>> >>>>>>> > >> > > > > > > >>>> >>>>>>> > >> > > > > - Has anyone else experienced > > the same > > >>>> problem or > > >>>> >>>>>>> have similar > > >>>> >>>>>>> > >> > > > observation > > >>>> >>>>>>> > >> > > > > on TravisCI? (I suspect it > > has things > > >>>> to do with > > >>>> >> time > > >>>> >>>>>>> zone) > > >>>> >>>>>>> > >> > > > > > > >>>> >>>>>>> > >> > > > > - What pricing plan of > > TravisCI is > > >>>> Flink currently > > >>>> >>>>>>> using? Is it > > >>>> >>>>>>> > >> the > > >>>> >>>>>>> > >> > > free > > >>>> >>>>>>> > >> > > > > plan for open source > > projects? What > > >>>> are the > > >>>> >>>>>>> guaranteed build > > >>>> >>>>>>> > >> capacity > > >>>> >>>>>>> > >> > > of > > >>>> >>>>>>> > >> > > > > the current plan? > > >>>> >>>>>>> > >> > > > > > > >>>> >>>>>>> > >> > > > > - If the current pricing plan > > (either > > >>>> free or paid) > > >>>> >>>>>> can't > > >>>> >>>>>>> > provide > > >>>> >>>>>>> > >> > > stable > > >>>> >>>>>>> > >> > > > > build capacity, can we > > upgrade to a > > >>>> higher priced > > >>>> >>>>>>> plan with > > >>>> >>>>>>> > larger > > >>>> >>>>>>> > >> > and > > >>>> >>>>>>> > >> > > > more > > >>>> >>>>>>> > >> > > > > stable build capacity? > > >>>> >>>>>>> > >> > > > > > > >>>> >>>>>>> > >> > > > > BTW, another factor that > > contribute to > > >>>> the > > >>>> >>>>>>> productivity problem > > >>>> >>>>>>> > is > > >>>> >>>>>>> > >> > that > > >>>> >>>>>>> > >> > > > > our build is slow - we run > > full build > > >>>> for every PR > > >>>> >>>> and a > > >>>> >>>>>>> > >> successful > > >>>> >>>>>>> > >> > > full > > >>>> >>>>>>> > >> > > > > build takes ~5h. We > > definitely have > > >>>> more options to > > >>>> >>>>>>> solve it, > > >>>> >>>>>>> > for > > >>>> >>>>>>> > >> > > > instance, > > >>>> >>>>>>> > >> > > > > modularize the build graphs > > and reuse > > >>>> artifacts > > >>>> >> from > > >>>> >>>> the > > >>>> >>>>>>> > previous > > >>>> >>>>>>> > >> > > build. > > >>>> >>>>>>> > >> > > > > But I think that can be a big > > effort > > >>>> which is much > > >>>> >>>>>>> harder to > > >>>> >>>>>>> > >> > accomplish > > >>>> >>>>>>> > >> > > > in > > >>>> >>>>>>> > >> > > > > a short period of time and > > may deserve > > >>>> its own > > >>>> >>>> separate > > >>>> >>>>>>> > >> discussion. > > >>>> >>>>>>> > >> > > > > > > >>>> >>>>>>> > >> > > > > [1] > > >>>> >> https://travis-ci.org/apache/flink/pull_requests > > >>>> >>>>>>> > >> > > > > > > >>>> >>>>>>> > >> > > > > > > >>>> >>>>>>> > >> > > > > > >>>> >>>>>>> > >> > > > > >>>> >>>>>>> > >> > > > >>>> >>>>>>> > >> > > >>>> >>>>>>> > > > > >>>> >>>>>>> > > > >>>> >>>>>>> > > >>>> >>>>>>> > > >>>> >>>>>>> -- > > >>>> >>>>>>> Best Regards > > >>>> >>>>>>> > > >>>> >>>>>>> Jeff Zhang > > >>>> >>>>>>> > > >>>> >> > > >>>> > > >>> > > >> > > > >