Re: [VOTE] Migrate to sponsored Travis account

Kurt Young Thu, 04 Jul 2019 02:58:53 -0700

+1 and great thanks Chesnay for pushing this.

Best,
Kurt



On Thu, Jul 4, 2019 at 5:44 PM Aljoscha Krettek <[email protected]> wrote:

> +1
>
> Aljoscha
>
> > On 4. Jul 2019, at 11:09, Stephan Ewen <[email protected]> wrote:
> >
> > +1 to move to a private Travis account.
> >
> > I can confirm that Ververica will sponsor a Travis CI plan that is
> > equivalent or a bit higher than the previous ASF quota (10 concurrent
> build
> > queues)
> >
> > Best,
> > Stephan
> >
> > On Thu, Jul 4, 2019 at 10:46 AM Chesnay Schepler <[email protected]>
> wrote:
> >
> >> I've raised a JIRA
> >> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to
> inquire
> >> whether it would be possible to switch to a different Travis account,
> >> and if so what steps would need to be taken.
> >> We need a proper confirmation from INFRA since we are not in full
> >> control of the flink repository (for example, we cannot access the
> >> settings page).
> >>
> >> If this is indeed possible, Ververica is willing sponsor a Travis
> >> account for the Flink project.
> >> This would provide us with more than enough resources than we need.
> >>
> >> Since this makes the project more reliant on resources provided by
> >> external companies I would like to vote on this.
> >>
> >> Please vote on this proposal, as follows:
> >> [ ] +1, Approve the migration to a Ververica-sponsored Travis account,
> >> provided that INFRA approves
> >> [ ] -1, Do not approach the migration to a Ververica-sponsored Travis
> >> account
> >>
> >> The vote will be open for at least 24h, and until we have confirmation
> >> from INFRA. The voting period may be shorter than the usual 3 days since
> >> our current is effectively not working.
> >>
> >> On 04/07/2019 06:51, Bowen Li wrote:
> >>> Re: > Are they using their own Travis CI pool, or did the switch to an
> >>> entirely different CI service?
> >>>
> >>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
> >>> currently moving away from ASF's Travis to their own in-house metal
> >>> machines at [1] with custom CI application at [2]. They've seen
> >>> significant improvement w.r.t both much higher performance and
> >>> basically no resource waiting time, "night-and-day" difference quoting
> >>> Wes.
> >>>
> >>> Re: > If we can just switch to our own Travis pool, just for our
> >>> project, then this might be something we can do fairly quickly?
> >>>
> >>> I believe so, according to [3] and [4]
> >>>
> >>>
> >>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
> >>> [2] https://github.com/ursa-labs/ursabot
> >>> [3]
> >>>
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> >>> [4]
> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
> >>>
> >>>
> >>>
> >>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <[email protected]
> >>> <mailto:[email protected]>> wrote:
> >>>
> >>>    Are they using their own Travis CI pool, or did the switch to an
> >>>    entirely different CI service?
> >>>
> >>>    If we can just switch to our own Travis pool, just for our
> >>>    project, then
> >>>    this might be something we can do fairly quickly?
> >>>
> >>>    On 03/07/2019 05:55, Bowen Li wrote:
> >>>> I responded in the INFRA ticket [1] that I believe they are
> >>>    using a wrong
> >>>> metric against Flink and the total build time is a completely
> >>>    different
> >>>> thing than guaranteed build capacity.
> >>>>
> >>>> My response:
> >>>>
> >>>> "As mentioned above, since I started to pay attention to Flink's
> >>>    build
> >>>> queue a few tens of days ago, I'm in Seattle and I saw no build
> >>>    was kicking
> >>>> off in PST daytime in weekdays for Flink. Our teammates in China
> >>>    and Europe
> >>>> have also reported similar observations. So we need to evaluate
> >>>    how the
> >>>> large total build time came from - if 1) your number and 2) our
> >>>> observations from three locations that cover pretty much a full
> >>>    day, are
> >>>> all true, I **guess** one reason can be that - highly likely the
> >>>    extra
> >>>> build time came from weekends when other Apache projects may be
> >>>    idle and
> >>>> Flink just drains hard its congested queue.
> >>>>
> >>>> Please be aware of that we're not complaining about the lack of
> >>>    resources
> >>>> in general, I'm complaining about the lack of **stable, dedicated**
> >>>> resources. An example for the latter one is, currently even if
> >>>    no build is
> >>>> in Flink's queue and I submit a request to be the queue head in PST
> >>>> morning, my build won't even start in 6-8+h. That is an absurd
> >>>    amount of
> >>>> waiting time.
> >>>>
> >>>> That's saying, if ASF INFRA decides to adopt a quota system and
> >>>    grants
> >>>> Flink five DEDICATED servers that runs all the time only for
> >>>    Flink, that'll
> >>>> be PERFECT and can totally solve our problem now.
> >>>>
> >>>> Please be aware of that we're not complaining about the lack of
> >>>    resources
> >>>> in general, I'm complaining about the lack of **stable, dedicated**
> >>>> resources. An example for the latter one is, currently even if
> >>>    no build is
> >>>> in Flink's queue and I submit a request to be the queue head in PST
> >>>> morning, my build won't even start in 6-8+h. That is an absurd
> >>>    amount of
> >>>> waiting time.
> >>>>
> >>>>
> >>>> That's saying, if ASF INFRA decides to adopt a quota system and
> >>>    grants
> >>>> Flink five DEDICATED servers that runs all the time only for
> >>>    Flink, that'll
> >>>> be PERFECT and can totally solve our problem now.
> >>>>
> >>>> I feel what's missing in the ASF INFRA's Travis resource pool is
> >>>    some level
> >>>> of build capacity SLAs and certainty"
> >>>>
> >>>>
> >>>> Again, I believe there are differences in nature of these two
> >>>    problems,
> >>>> long build time v.s. lack of dedicated build resource. That's
> >>>    saying,
> >>>> shortening build time may relieve the situation, and may not.
> >>>    I'm sightly
> >>>> negative on disabling IT cases for PRs, due to the downside is
> >>>    that we are
> >>>> at risk of any potential bugs in PR that UTs doesn't catch, and
> >>>    may cost a
> >>>> lot more to fix and if it slows others down or even block
> >>>    others, but am
> >>>> open to others opinions on it.
> >>>>
> >>>> AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
> >>>    feasible to
> >>>> solve our problem since INFRA's pool is fully shared and they
> >>>    have no
> >>>> control and finer insights over resource allocation to a
> >>>    specific Apache
> >>>> project. As mentioned in [1], Apache Arrow is moving away from
> >>>    ASF INFRA
> >>>> Travis pool (they are actually surprised Flink hasn't plan to do
> >>>    so). I
> >>>> know that Spark is on its own build infra. If we all agree that
> >>>    funding our
> >>>> own build infra, I'd be glad to help investigate any potential
> >>>    options
> >>>> after releasing 1.9 since I'm super busy with 1.9 now.
> >>>>
> >>>> [1] https://issues.apache.org/jira/browse/INFRA-18533
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
> >>>    <[email protected] <mailto:[email protected]>> wrote:
> >>>>
> >>>>> As a short-term stopgap, since we can assume this issue to
> >>>    become much
> >>>>> worse in the following days/weeks, we could disable IT cases in
> >>>    PRs and
> >>>>> only run them on master.
> >>>>>
> >>>>> On 02/07/2019 12:03, Chesnay Schepler wrote:
> >>>>>> People really have to stop thinking that just because
> >>>    something works
> >>>>>> for us it is also a good solution.
> >>>>>> Also, please remember that our builds run for 2h from start to
> >>>    finish,
> >>>>>> and not the 14 _minutes_ it takes for zeppelin.
> >>>>>> We are dealing with an entirely different scale here, both in
> >>>    terms of
> >>>>>> build times and number of builds.
> >>>>>>
> >>>>>> In this very thread people have been complaining about long queue
> >>>>>> times for their builds. Surprise, other Apache projects have been
> >>>>>> suffering the very same thing due to us not controlling our build
> >>>>>> times. While switching services (be it Jenkins, CircleCI or
> >>>    whatever)
> >>>>>> will possibly work for us (and these options are actually
> >>>    attractive,
> >>>>>> like CircleCI's proper support for build artifacts), it will also
> >>>>>> result in us likely negatively affecting other projects in
> >>>    significant
> >>>>>> ways.
> >>>>>>
> >>>>>> Sure, the Jenkins setup has a good user experience for us, at
> >>>    the cost
> >>>>>> of blocking Jenkins workers for a _lot_ of time. Right now we
> >>>    have 25
> >>>>>> PR's in our queue; that's possibly 50h we'd consume of Jenkins
> >>>>>> resources, and the European contributors haven't even really
> >>>    started yet.
> >>>>>>
> >>>>>> FYI, the latest INFRA response from INFRA-18533:
> >>>>>>
> >>>>>> "Our rough metrics shows that Flink used over 5800 hours of
> >>>    build time
> >>>>>> last month. That is equal to EIGHT servers running 24/7 for
> >>>    the ENTIRE
> >>>>>> MONTH. EIGHT. nonstop.
> >>>>>> When we discovered this last night, we discussed it some and
> >>>    are going
> >>>>>> to tune down Flink to allow only five executors maximum. We
> >> cannot
> >>>>>> allow Flink to consume so much of a Foundation shared resource."
> >>>>>>
> >>>>>> So yes, we either
> >>>>>> a) have to heavily reduce our CI usage or
> >>>>>> b) fund our own, either maintaining it ourselves or donating
> >>>    to Apache.
> >>>>>>
> >>>>>> On 02/07/2019 05:11, Bowen Li wrote:
> >>>>>>> By looking at the git history of the Jenkins script, its core
> >>>    part
> >>>>>>> was finished in March 2017 (and only two minor update in
> >>>    2017/2018),
> >>>>>>> so it's been running for over two years now and feels like
> >>>    Zepplin
> >>>>>>> community has been quite happy with it. @Jeff Zhang
> >>>>>>> <mailto:[email protected] <mailto:[email protected]>> can you
> >>>    share your insights and user
> >>>>>>> experience with the Jenkins+Travis approach?
> >>>>>>>
> >>>>>>> Things like:
> >>>>>>>
> >>>>>>> - has the approach completely solved the resource capacity
> >>>    problem
> >>>>>>> for Zepplin community? is Zepplin community happy with the
> >>>    result?
> >>>>>>> - is the whole configuration chain stable (e.g. uptime) enough?
> >>>>>>> - how often do you need to maintain the Jenkins infra? how many
> >>>>>>> people are usually involved in maintenance and bug-fixes?
> >>>>>>>
> >>>>>>> The downside of this approach seems mostly to be on the
> >>>    maintenance
> >>>>>>> to me - maintain the script and Jenkins infra.
> >>>>>>>
> >>>>>>> ** Having Our Own Travis-CI.com Account **
> >>>>>>>
> >>>>>>> Another alternative I've been thinking of is to have our own
> >>>>>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com>
> >>>    account with paid dedicated
> >>>>>>> resources. Note travis-ci.org <http://travis-ci.org>
> >>>    <http://travis-ci.org> is the free
> >>>>>>> version and travis-ci.com <http://travis-ci.com>
> >>>    <http://travis-ci.com> is the commercial
> >>>>>>> version. We currently use a shared resource pool managed by
> >>>    ASK INFRA
> >>>>>>> team on travis-ci.org <http://travis-ci.org>
> >>>    <http://travis-ci.org>, but we have no control
> >>>>>>> over it - we can't see how it's configured, how much
> >>>    resources are
> >>>>>>> available, how resources are allocated among Apache projects,
> >>>    etc.
> >>>>>>> The nice thing about having an account on travis-ci.com
> >>>    <http://travis-ci.com>
> >>>>>>> <http://travis-ci.com> are:
> >>>>>>>
> >>>>>>> - relatively low cost with much better resource guarantee
> >>>    than what
> >>>>>>> we currently have [1]: $249/month with 5 dedicated concurrency,
> >>>>>>> $489/month with 10 concurrency
> >>>>>>> - low maintenance work compared to using Jenkins
> >>>>>>> - (potentially) no migration cost according to Travis's doc [2]
> >>>>>>> (pending verification)
> >>>>>>> - full control over the build capacity/configuration compared to
> >>>>>>> using ASF INFRA's pool
> >>>>>>>
> >>>>>>> I'd be surprised if we as such a vibrant community cannot
> >>>    find and
> >>>>>>> fund $249*12=$2988 a year in exchange for a much better
> >> developer
> >>>>>>> experience and much higher productivity.
> >>>>>>>
> >>>>>>> [1] https://travis-ci.com/plans
> >>>>>>> [2]
> >>>>>>>
> >>>>>
> >>>
> >>
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> >>>>>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
> >>>    <[email protected] <mailto:[email protected]>
> >>>>>>> <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >>>>>>>
> >>>>>>>     So yes, the Jenkins job keeps pulling the state from
> >>>    Travis until it
> >>>>>>>     finishes.
> >>>>>>>
> >>>>>>>     Note sure I'm comfortable with the idea of using Jenkins
> >>>    workers
> >>>>>>>     just to
> >>>>>>>     idle for a several hours.
> >>>>>>>
> >>>>>>>     On 29/06/2019 14:56, Jeff Zhang wrote:
> >>>>>>>> Here's what zeppelin community did, we make a python
> >>>    script to
> >>>>>>>     check the
> >>>>>>>> build status of pull request.
> >>>>>>>> Here's script:
> >>>>>>>>
> >>>    https://github.com/apache/zeppelin/blob/master/travis_check.py
> >>>>>>>>
> >>>>>>>> And this is the script we used in Jenkins build job.
> >>>>>>>>
> >>>>>>>> if [ -f "travis_check.py" ]; then
> >>>>>>>>   git log -n 1
> >>>>>>>>   STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
> >>>>>>>     request.*from.*" | sed
> >>>>>>>> 's/.*GitHub pull request <a
> >>>>>>>> href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
> >>>    \2/g')
> >>>>>>>>   AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
> >>>>>>>>   PR=$(echo $STATUS | awk '{print $1}' | sed
> >>>>>>> 's/.*[/]\(.*\)$/\1/g')
> >>>>>>>>   #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
> >>>    '{print $3}')
> >>>>>>>>   #if [ -z $COMMIT ]; then
> >>>>>>>>   #  COMMIT=$(curl -s
> >>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> >>>>>>>> | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
> >>>    tr '\n' ' '
> >>>>>>>     | sed
> >>>>>>>> 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
> >>>    grep -v
> >>>>>>>     "apache:" |
> >>>>>>>> sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> >>>>>>>>   #fi
> >>>>>>>>
> >>>>>>>>   # get commit hash from PR
> >>>>>>>>   COMMIT=$(curl -s
> >>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
> >>>>>>>> grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr
> >>>    '\n' ' '
> >>>>>>> | sed
> >>>>>>>> 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
> >>>    grep -v
> >>>>>>>     "apache:" |
> >>>>>>>> sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
> >>>>>>>>   sleep 30 # sleep few moment to wait travis starts
> >>>    the build
> >>>>>>>>   RET_CODE=0
> >>>>>>>>   python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> >>>    RET_CODE=$?
> >>>>>>>>   if [ $RET_CODE -eq 2 ]; then # try with repository
> >>>    name when
> >>>>>>>     travis-ci is
> >>>>>>>> not available in the account
> >>>>>>>>     RET_CODE=0
> >>>>>>>>     AUTHOR=$(curl -s
> >>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
> >>>>>>>> | grep '"full_name":' | grep -v "apache/zeppelin" | sed
> >>>>>>>> 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
> >>>>>>>>   python ./travis_check.py ${AUTHOR} ${COMMIT} ||
> >>>    RET_CODE=$?
> >>>>>>>>   fi
> >>>>>>>>
> >>>>>>>>   if [ $RET_CODE -eq 2 ]; then # fail with can't find
> >>>    build
> >>>>>>>     information in
> >>>>>>>> the travis
> >>>>>>>>     set +x
> >>>>>>>>     echo
> >>>    "-----------------------------------------------------"
> >>>>>>>>     echo "Looks like travis-ci is not configured for
> >>>    your fork."
> >>>>>>>>     echo "Please setup by swich on 'zeppelin'
> >>>    repository at
> >>>>>>>> https://travis-ci.org/profile and travis-ci."
> >>>>>>>>     echo "And then make sure 'Build branch updates'
> >>>    option is
> >>>>>>>     enabled in
> >>>>>>>> the settings
> >>>    https://travis-ci.org/${AUTHOR}/zeppelin/settings
> >>>    <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
> >>>>>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
> >>>>>>>>     echo ""
> >>>>>>>>     echo "To trigger CI after setup, you will need
> >>>    ammend your
> >>>>>>>     last commit
> >>>>>>>> with"
> >>>>>>>>     echo "git commit --amend"
> >>>>>>>>     echo "git push your-remote HEAD --force"
> >>>>>>>>     echo ""
> >>>>>>>>     echo "See
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
> >>>>>>>> ."
> >>>>>>>>   fi
> >>>>>>>>
> >>>>>>>>   exit $RET_CODE
> >>>>>>>> else
> >>>>>>>>   set +x
> >>>>>>>>   echo "travis_check.py does not exists"
> >>>>>>>>   exit 1
> >>>>>>>> fi
> >>>>>>>>
> >>>>>>>> Chesnay Schepler <[email protected]
> >>>    <mailto:[email protected]>
> >>>>>>>     <mailto:[email protected] <mailto:[email protected]>>>
> >>>    于2019年6月29日周六 下午3:17写道：
> >>>>>>>>
> >>>>>>>>> Does this imply that a Jenkins job is active as long
> >>>    as the
> >>>>>>>     Travis build
> >>>>>>>>> runs?
> >>>>>>>>>
> >>>>>>>>> On 26/06/2019 21:28, Bowen Li wrote:
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> @Dawid, I think the "long test running" as I
> >>>    mentioned in the
> >>>>>>>     first
> >>>>>>>>> email,
> >>>>>>>>>> also as you guys said, belongs to "a big effort
> >>>    which is much
> >>>>>>>     harder to
> >>>>>>>>>> accomplish in a short period of time and may deserve
> >>>    its own
> >>>>>>>     separate
> >>>>>>>>>> discussion". Thus I didn't include it in what we can
> >>>    do in a
> >>>>>>>     foreseeable
> >>>>>>>>>> short term.
> >>>>>>>>>>
> >>>>>>>>>> Besides, I don't think that's the ultimate reason
> >>>    for lack of
> >>>>>>>     build
> >>>>>>>>>> resources. Even if the build is shortened to
> >>>    something like
> >>>>>>>     2h, the
> >>>>>>>>>> problems of no build machine works about 6 or more
> >>>    hours in
> >>>>>>>     PST daytime
> >>>>>>>>>> that I described will still happen, because no
> >>>    machine from
> >>>>>>>     ASF INFRA's
> >>>>>>>>>> pool is allocated to Flink. As I have paid close
> >>>    attention to
> >>>>>>>     the build
> >>>>>>>>>> queue in the past few weekdays, it's a pretty clear
> >>>    pattern now.
> >>>>>>>>>>
> >>>>>>>>>> **The ultimate root cause** for that is - we don't
> >>>    have any
> >>>>>>>     **dedicated**
> >>>>>>>>>> build resources that we can stably rely on. I'm
> >>>    actually ok to
> >>>>>>>     wait for a
> >>>>>>>>>> long time if there are build requests running, it
> >>>    means at
> >>>>>>>     least we are
> >>>>>>>>>> making progress. But I'm not ok with no build
> >>>    resource. A
> >>>>>>>     better place I
> >>>>>>>>>> think we should aim at in short term is to always
> >>>    have at
> >>>>>>>     least a central
> >>>>>>>>>> pool (can be 3 or 5) of machines dedicated to build
> >>>    Flink at
> >>>>>>>     any time, or
> >>>>>>>>>> maybe use users resources.
> >>>>>>>>>>
> >>>>>>>>>> @Chesnay @Robert I synced with Jeff offline that
> >>>    Zeppelin
> >>>>>>>     community is
> >>>>>>>>>> using a Jenkins job to automatically build on users'
> >>>    travis
> >>>>>>>     account and
> >>>>>>>>>> link the result back to github PR. I guess the
> >>>    Jenkins job
> >>>>>>>     would fetch
> >>>>>>>>>> latest upstream master and build the PR against it.
> >>>    Jeff has
> >>>>>>> filed
> >>>>>>>>> tickets
> >>>>>>>>>> to learn and get access to the Jenkins infra. It'll
> >>>    better to
> >>>>>>>     fully
> >>>>>>>>>> understand it first before judging this approach.
> >>>>>>>>>>
> >>>>>>>>>> I also heard good things about CircleCI, and ASF
> >>>    INFRA seems
> >>>>>>>     to have a
> >>>>>>>>> pool
> >>>>>>>>>> of build capacity there too. Can be an alternative
> >>>    to consider.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
> >>>>>>>>> [email protected]
> >>>    <mailto:[email protected]> <mailto:[email protected]
> >>>    <mailto:[email protected]>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Sorry to jump in late, but I think Bowen missed the
> >>>    most
> >>>>>>>     important point
> >>>>>>>>>>> from Chesnay's previous message in the summary. The
> >>>    ultimate
> >>>>>>>     reason for
> >>>>>>>>>>> all the problems is that the tests take close to 2
> >>>    hours to
> >>>>>>>     run already.
> >>>>>>>>>>> I fully support this claim: "Unless people start
> >>>    caring about
> >>>>>>>     test times
> >>>>>>>>>>> before adding them, this issue cannot be solved"
> >>>>>>>>>>>
> >>>>>>>>>>> This is also another reason why using user's Travis
> >>>    account
> >>>>>>>     won't help.
> >>>>>>>>>>> Every few weeks we reach the user's time limit for
> >>>    a single
> >>>>>>>     profile.
> >>>>>>>>>>> This makes the user's builds simply fail, until we
> >>>    either
> >>>>>>>     properly
> >>>>>>>>>>> decrease the time the tests take (which I am not
> >>>    sure we ever
> >>>>>>>     did) or
> >>>>>>>>>>> postpone the problem by splitting into more
> >>>    profiles. (Note
> >>>>>>>     that the ASF
> >>>>>>>>>>> Travis account has higher time limits)
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>>
> >>>>>>>>>>> Dawid
> >>>>>>>>>>>
> >>>>>>>>>>> On 26/06/2019 09:36, Robert Metzger wrote:
> >>>>>>>>>>>> Do we know if using "the best" available hardware
> >>>    would
> >>>>>>>     improve the
> >>>>>>>>> build
> >>>>>>>>>>>> times?
> >>>>>>>>>>>> Imagine we would run the build on machines with
> >>>    plenty of
> >>>>>>>     main memory
> >>>>>>>>> to
> >>>>>>>>>>>> mount everything to ramdisk + the latest CPU
> >>>    architecture?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Throwing hardware at the problem could help reduce
> >>>    the time
> >>>>>>>     of an
> >>>>>>>>>>>> individual build, and using our own infrastructure
> >>>    would
> >>>>>>>     remove our
> >>>>>>>>>>>> dependency on Apache's Travis account (with the
> >>>    obvious
> >>>>>>>     downside of
> >>>>>>>>>>> having
> >>>>>>>>>>>> to maintain the infrastructure)
> >>>>>>>>>>>> We could use an open source travis alternative, to
> >>>    have a
> >>>>>>>     similar
> >>>>>>>>>>>> experience and make the migration easy.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
> >>>>>>>     <[email protected] <mailto:[email protected]>
> >>>    <mailto:[email protected] <mailto:[email protected]>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>> From what I gathered, there's no special
> >>>    sauce that the
> >>>>>>>     Zeppelin
> >>>>>>>>>>>>> project uses which actually integrates a users
> >> Travis
> >>>>>>>     account into the
> >>>>>>>>>>> PR.
> >>>>>>>>>>>>> They just disabled Travis for PRs. And that's
> >>>    kind of it.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Naturally we can do this (duh) and safe the ASF a
> >>>    fair
> >>>>>>>     amount of
> >>>>>>>>>>>>> resources, but there are downsides:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The discoverability of the Travis check takes a
> >>>    nose-dive.
> >>>>>>>     Either we
> >>>>>>>>>>>>> require every contributor to always, an every
> >>>    commit, also
> >>>>>>>     post a
> >>>>>>>>> Travis
> >>>>>>>>>>>>> build, or we have the reviewer sift through the
> >>>>>>>     contributors account
> >>>>>>>>> to
> >>>>>>>>>>>>> find it.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> This is rather cumbersome. Additionally, it's
> >>>    also not
> >>>>>>>     equivalent to
> >>>>>>>>>>>>> having a PR build.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> A normal branch build takes a branch as is and
> >>>    tests it. A
> >>>>>>>     PR build
> >>>>>>>>>>>>> merges the branch into master, and then runs it.
> >>>    (Fun fact:
> >>>>>>>     This is
> >>>>>>>>> why
> >>>>>>>>>>>>> a PR without merge conflicts is not being run on
> >>>    Travis.)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> And ultimately, everyone can already make use of
> >> this
> >>>>>>>     approach anyway.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 25/06/2019 08:02, Jark Wu wrote:
> >>>>>>>>>>>>>> Hi Jeff,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks for sharing the Zeppelin approach. I
> >>>    think it's a
> >>>>>>>     good idea to
> >>>>>>>>>>>>>> leverage user's travis account.
> >>>>>>>>>>>>>> In this way, we can have almost unlimited
> >>>    concurrent build
> >>>>>>>     jobs and
> >>>>>>>>>>>>>> developers can restart build by themselves
> >>>    (currently only
> >>>>>>>     committers
> >>>>>>>>>>>>>> can restart PR's build).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> But I'm still not very clear how to integrate
> >> user's
> >>>>>>>     travis build
> >>>>>>>>> into
> >>>>>>>>>>>>>> the Flink pull request's build automatically.
> >>>    Can you
> >>>>>>>     explain more in
> >>>>>>>>>>>>>> detail?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Another question: does travis only build
> >>>    branches for user
> >>>>>>>     account?
> >>>>>>>>>>>>>> My concern is that builds for PRs will rebase
> >> user's
> >>>>>>>     commits against
> >>>>>>>>>>>>>> current master branch.
> >>>>>>>>>>>>>> This will help us to find problems before
> >>>    merge.  Builds
> >>>>>>>     for branches
> >>>>>>>>>>>>>> will lose the impact of new commits in master.
> >>>>>>>>>>>>>> How does Zeppelin solve this problem?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks again for sharing the idea.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
> >>>    <[email protected] <mailto:[email protected]>
> >>>>>>>     <mailto:[email protected] <mailto:[email protected]>>
> >>>>>>>>>>>>>> <mailto:[email protected]
> >>>    <mailto:[email protected]> <mailto:[email protected]
> >>>    <mailto:[email protected]>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>      Hi Folks,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Zeppelin meet this kind of issue before, we solve
> >>>>>>> it by
> >>>>>>>>> delegating
> >>>>>>>>>>>>>>      each
> >>>>>>>>>>>>>>      one's PR build to his travis account
> >>>    (Everyone can
> >>>>>>>     have 5 free
> >>>>>>>>>>>>>>      slot for
> >>>>>>>>>>>>>> travis build).
> >>>>>>>>>>>>>> Apache account travis build is only triggered when
> >>>>>>>     PR is merged.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>      Kurt Young <[email protected]
> >>>    <mailto:[email protected]>
> >>>>>>>     <mailto:[email protected] <mailto:[email protected]>>
> >>>    <mailto:[email protected] <mailto:[email protected]>
> >>>>>>>     <mailto:[email protected] <mailto:[email protected]>>>>
> >>>>>>>>>>>>>> 于2019年6月25日周二 上午10:16写道：
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> (Forgot to cc George)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>> Kurt
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
> >>>>>>>     <[email protected] <mailto:[email protected]>
> >>>    <mailto:[email protected] <mailto:[email protected]>>
> >>>>>>>>>>>>>> <mailto:[email protected]
> >>>    <mailto:[email protected]> <mailto:[email protected]
> >>>    <mailto:[email protected]>>>>
> >>>>>>>     wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hi Bowen,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks for bringing this up. We
> >>>    actually have
> >>>>>>>     discussed
> >>>>>>>>> about
> >>>>>>>>>>>>>>      this, and I
> >>>>>>>>>>>>>>>> think Till and George have
> >>>>>>>>>>>>>>>> already spend sometime investigating
> >>>    it. I have
> >>>>>>>     cced both of
> >>>>>>>>>>>>>>      them, and
> >>>>>>>>>>>>>>>> maybe they can share
> >>>>>>>>>>>>>>>> their findings.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>> Kurt
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
> >>>>>>>     <[email protected] <mailto:[email protected]>
> >>>    <mailto:[email protected] <mailto:[email protected]>>
> >>>>>>>>>>>>>> <mailto:[email protected]
> >>>    <mailto:[email protected]> <mailto:[email protected]
> >>>    <mailto:[email protected]>>>>
> >>>>>>>     wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hi Bowen,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Thanks for bringing this. We also
> >>>    suffered from
> >>>>>>>     the long
> >>>>>>>>>>>>>>      build time.
> >>>>>>>>>>>>>>>>> I agree that we should focus on
> >>>    solving build
> >>>>>>>     capacity
> >>>>>>>>>>>>>> problem in the
> >>>>>>>>>>>>>>>>> thread.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> My observation is there is only one
> >>>    build is
> >>>>>>>     running, all
> >>>>>>>>> the
> >>>>>>>>>>>>>> others
> >>>>>>>>>>>>>>>>> (other
> >>>>>>>>>>>>>>>>> PRs, master) are pending.
> >>>>>>>>>>>>>>>>> The pricing plan[1] of travis shows
> >>>    it can
> >>>>>>> support
> >>>>>>>>> concurrent
> >>>>>>>>>>>>>>      build
> >>>>>>>>>>>>>>> jobs.
> >>>>>>>>>>>>>>>>> But I don't know which plan we are
> >>>    using, might
> >>>>>>>     be the free
> >>>>>>>>>>>>>>      plan for
> >>>>>>>>>>>>>>> open
> >>>>>>>>>>>>>>>>> source.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I cc-ed Chesnay who may have some
> >>>    experience on
> >>>>>>>     Travis.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> [1]: https://travis-ci.com/plans
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
> >>>>>>>>> [email protected] <mailto:[email protected]>
> >>>    <mailto:[email protected] <mailto:[email protected]>>
> >>>>>>>>>>>>>> <mailto:[email protected]
> >>>    <mailto:[email protected]>
> >>>>>>>     <mailto:[email protected]
> >>>    <mailto:[email protected]>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Hi Steven,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I think you may not read what I
> >>>    wrote. The
> >>>>>>>     discussion is
> >>>>>>>>>>> about
> >>>>>>>>>>>>>>> "unstable
> >>>>>>>>>>>>>>>>>> build **capacity**", in another word
> >>>>>>>     "unstable / lack of
> >>>>>>>>>>> build
> >>>>>>>>>>>>>>>>> resources",
> >>>>>>>>>>>>>>>>>> not "unstable build".
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 4:40 PM
> >>>    Steven Wu
> >>>>>>>>>>>>>>      <[email protected]
> >>>    <mailto:[email protected]> <mailto:[email protected]
> >>>    <mailto:[email protected]>>
> >>>>>>>     <mailto:[email protected]
> >>>    <mailto:[email protected]> <mailto:[email protected]
> >>>    <mailto:[email protected]>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> long and sometimes unstable build is
> >>>>>>>     definitely a pain
> >>>>>>>>>>>>> point.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I suspect the build failure here in
> >>>>>>>>> flink-connector-kafka
> >>>>>>>>>>>>>>      is not
> >>>>>>>>>>>>>>>>> related
> >>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> my change. but there is no easy
> >>>    re-run the
> >>>>>>>     build on
> >>>>>>>>>>>>>> travis UI.
> >>>>>>>>>>>>>>> Google
> >>>>>>>>>>>>>>>>>>> search showed a trick of
> >>>    close-and-open the
> >>>>>>>     PR will
> >>>>>>>>>>>>>> trigger rebuild.
> >>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>> that could add noises to the PR
> >>>    activities.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>> https://travis-ci.org/apache/flink/jobs/545555519
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> travis-ci for my personal repo
> >>>    often failed
> >>>>>>>     with
> >>>>>>>>>>>>>> exceeding time
> >>>>>>>>>>>>>>> limit
> >>>>>>>>>>>>>>>>>> after
> >>>>>>>>>>>>>>>>>>> 4+ hours.
> >>>>>>>>>>>>>>>>>>> The job exceeded the maximum time
> >>>    limit for
> >>>>>>>     jobs, and
> >>>>>>>>> has
> >>>>>>>>>>>>>>      been
> >>>>>>>>>>>>>>>>>> terminated.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 4:15 PM
> >>>    Bowen Li
> >>>>>>>>>>>>>>      <[email protected]
> >>>    <mailto:[email protected]> <mailto:[email protected]
> >>>    <mailto:[email protected]>>
> >>>>>>>     <mailto:[email protected] <mailto:[email protected]>
> >>>    <mailto:[email protected] <mailto:[email protected]>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>> https://travis-ci.org/apache/flink/builds/549681530
> >>>>>>>>>>>>>>      This build
> >>>>>>>>>>>>>>>>>> request
> >>>>>>>>>>>>>>>>>>>> has
> >>>>>>>>>>>>>>>>>>>> been sitting at **HEAD of the
> >>>    queue**
> >>>>>>>     since I first
> >>>>>>>>> saw
> >>>>>>>>>>>>>>      it at PST
> >>>>>>>>>>>>>>>>>> 10:30am
> >>>>>>>>>>>>>>>>>>>> (not sure how long it's been
> >>>    there before
> >>>>>>>     10:30am).
> >>>>>>>>>>>>>>      It's PST
> >>>>>>>>>>>>>>> 4:12pm
> >>>>>>>>>>>>>>>>> now
> >>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>> it hasn't started yet.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 2:48 PM
> >>>    Bowen Li
> >>>>>>>>>>>>>>      <[email protected]
> >>>    <mailto:[email protected]> <mailto:[email protected]
> >>>    <mailto:[email protected]>>
> >>>>>>>     <mailto:[email protected] <mailto:[email protected]>
> >>>    <mailto:[email protected] <mailto:[email protected]>>>>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Hi devs,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I've been experiencing the pain
> >>>>>>>     resulting from lack
> >>>>>>>>>>>>>>      of stable
> >>>>>>>>>>>>>>>>> build
> >>>>>>>>>>>>>>>>>>>>> capacity on Travis for Flink
> >>>    PRs [1].
> >>>>>>>>> Specifically, I
> >>>>>>>>>>>>>> noticed
> >>>>>>>>>>>>>>>>> often
> >>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>> no
> >>>>>>>>>>>>>>>>>>>>> build in the queue is making any
> >>>>>>>     progress for
> >>>>>>>>> hours,
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>> suddenly
> >>>>>>>>>>>>>>>>> 5
> >>>>>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>>> 6
> >>>>>>>>>>>>>>>>>>>>> builds kick off all together
> >>>    after the
> >>>>>>>     long pause.
> >>>>>>>>>>>>>>      I'm at PST
> >>>>>>>>>>>>>>>>>> (UTC-08)
> >>>>>>>>>>>>>>>>>>>> time
> >>>>>>>>>>>>>>>>>>>>> zone, and I've seen pause can
> >>>    be as
> >>>>>>>     long as 6 hours
> >>>>>>>>>>>>>>      from PST 9am
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> 3pm
> >>>>>>>>>>>>>>>>>>>>> (let alone the time needed to
> >>>    drain the
> >>>>>>>     queue
> >>>>>>>>>>>>>> afterwards).
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I think this has greatly
> >>>    impacted our
> >>>>>>>     productivity.
> >>>>>>>>>>> I've
> >>>>>>>>>>>>>>>>> experienced
> >>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>> PRs submitted in the early
> >>>    morning of
> >>>>>>>     PST time zone
> >>>>>>>>>>>>>>      won't finish
> >>>>>>>>>>>>>>>>>> their
> >>>>>>>>>>>>>>>>>>>>> build until late night of the
> >>>    same day.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> So my questions are:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> - Has anyone else experienced
> >>>    the same
> >>>>>>>     problem or
> >>>>>>>>>>>>>>      have similar
> >>>>>>>>>>>>>>>>>>>> observation
> >>>>>>>>>>>>>>>>>>>>> on TravisCI? (I suspect it
> >>>    has things
> >>>>>>>     to do with
> >>>>>>>>> time
> >>>>>>>>>>>>>>      zone)
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> - What pricing plan of
> >>>    TravisCI is
> >>>>>>>     Flink currently
> >>>>>>>>>>>>>> using? Is it
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> free
> >>>>>>>>>>>>>>>>>>>>> plan for open source
> >>>    projects? What
> >>>>>>> are the
> >>>>>>>>>>>>>> guaranteed build
> >>>>>>>>>>>>>>>>> capacity
> >>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>> the current plan?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> - If the current pricing plan
> >>>    (either
> >>>>>>>     free or paid)
> >>>>>>>>>>>>> can't
> >>>>>>>>>>>>>>> provide
> >>>>>>>>>>>>>>>>>>> stable
> >>>>>>>>>>>>>>>>>>>>> build capacity, can we
> >>>    upgrade to a
> >>>>>>>     higher priced
> >>>>>>>>>>>>>>      plan with
> >>>>>>>>>>>>>>> larger
> >>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>> more
> >>>>>>>>>>>>>>>>>>>>> stable build capacity?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> BTW, another factor that
> >>>    contribute to
> >>>>>>> the
> >>>>>>>>>>>>>> productivity problem
> >>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>> our build is slow - we run
> >>>    full build
> >>>>>>>     for every PR
> >>>>>>>>>>> and a
> >>>>>>>>>>>>>>>>> successful
> >>>>>>>>>>>>>>>>>>> full
> >>>>>>>>>>>>>>>>>>>>> build takes ~5h. We
> >>>    definitely have
> >>>>>>>     more options to
> >>>>>>>>>>>>>>      solve it,
> >>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>> instance,
> >>>>>>>>>>>>>>>>>>>>> modularize the build graphs
> >>>    and reuse
> >>>>>>>     artifacts
> >>>>>>>>> from
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>> previous
> >>>>>>>>>>>>>>>>>>> build.
> >>>>>>>>>>>>>>>>>>>>> But I think that can be a big
> >>>    effort
> >>>>>>>     which is much
> >>>>>>>>>>>>>> harder to
> >>>>>>>>>>>>>>>>>> accomplish
> >>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>> a short period of time and
> >>>    may deserve
> >>>>>>>     its own
> >>>>>>>>>>> separate
> >>>>>>>>>>>>>>>>> discussion.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>> https://travis-ci.org/apache/flink/pull_requests
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>      --
> >>>>>>>>>>>>>>      Best Regards
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>      Jeff Zhang
> >>>>>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> >>
>
>

Re: [VOTE] Migrate to sponsored Travis account

Reply via email to