Re: [VOTE] Migrate to sponsored Travis account

Aljoscha Krettek Thu, 04 Jul 2019 02:44:51 -0700

+1

Aljoscha


> On 4. Jul 2019, at 11:09, Stephan Ewen <se...@apache.org> wrote:
> 
> +1 to move to a private Travis account.
> 
> I can confirm that Ververica will sponsor a Travis CI plan that is
> equivalent or a bit higher than the previous ASF quota (10 concurrent build
> queues)
> 
> Best,
> Stephan
> 
> On Thu, Jul 4, 2019 at 10:46 AM Chesnay Schepler <ches...@apache.org> wrote:
> 
>> I've raised a JIRA
>> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to inquire
>> whether it would be possible to switch to a different Travis account,
>> and if so what steps would need to be taken.
>> We need a proper confirmation from INFRA since we are not in full
>> control of the flink repository (for example, we cannot access the
>> settings page).
>> 
>> If this is indeed possible, Ververica is willing sponsor a Travis
>> account for the Flink project.
>> This would provide us with more than enough resources than we need.
>> 
>> Since this makes the project more reliant on resources provided by
>> external companies I would like to vote on this.
>> 
>> Please vote on this proposal, as follows:
>> [ ] +1, Approve the migration to a Ververica-sponsored Travis account,
>> provided that INFRA approves
>> [ ] -1, Do not approach the migration to a Ververica-sponsored Travis
>> account
>> 
>> The vote will be open for at least 24h, and until we have confirmation
>> from INFRA. The voting period may be shorter than the usual 3 days since
>> our current is effectively not working.
>> 
>> On 04/07/2019 06:51, Bowen Li wrote:
>>> Re: > Are they using their own Travis CI pool, or did the switch to an
>>> entirely different CI service?
>>> 
>>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
>>> currently moving away from ASF's Travis to their own in-house metal
>>> machines at [1] with custom CI application at [2]. They've seen
>>> significant improvement w.r.t both much higher performance and
>>> basically no resource waiting time, "night-and-day" difference quoting
>>> Wes.
>>> 
>>> Re: > If we can just switch to our own Travis pool, just for our
>>> project, then this might be something we can do fairly quickly?
>>> 
>>> I believe so, according to [3] and [4]
>>> 
>>> 
>>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
>>> [2] https://github.com/ursa-labs/ursabot
>>> [3]
>>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>>> [4] https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
>>> 
>>> 
>>> 
>>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <ches...@apache.org
>>> <mailto:ches...@apache.org>> wrote:
>>> 
>>>    Are they using their own Travis CI pool, or did the switch to an
>>>    entirely different CI service?
>>> 
>>>    If we can just switch to our own Travis pool, just for our
>>>    project, then
>>>    this might be something we can do fairly quickly?
>>> 
>>>    On 03/07/2019 05:55, Bowen Li wrote:
>>>> I responded in the INFRA ticket [1] that I believe they are
>>>    using a wrong
>>>> metric against Flink and the total build time is a completely
>>>    different
>>>> thing than guaranteed build capacity.
>>>> 
>>>> My response:
>>>> 
>>>> "As mentioned above, since I started to pay attention to Flink's
>>>    build
>>>> queue a few tens of days ago, I'm in Seattle and I saw no build
>>>    was kicking
>>>> off in PST daytime in weekdays for Flink. Our teammates in China
>>>    and Europe
>>>> have also reported similar observations. So we need to evaluate
>>>    how the
>>>> large total build time came from - if 1) your number and 2) our
>>>> observations from three locations that cover pretty much a full
>>>    day, are
>>>> all true, I **guess** one reason can be that - highly likely the
>>>    extra
>>>> build time came from weekends when other Apache projects may be
>>>    idle and
>>>> Flink just drains hard its congested queue.
>>>> 
>>>> Please be aware of that we're not complaining about the lack of
>>>    resources
>>>> in general, I'm complaining about the lack of **stable, dedicated**
>>>> resources. An example for the latter one is, currently even if
>>>    no build is
>>>> in Flink's queue and I submit a request to be the queue head in PST
>>>> morning, my build won't even start in 6-8+h. That is an absurd
>>>    amount of
>>>> waiting time.
>>>> 
>>>> That's saying, if ASF INFRA decides to adopt a quota system and
>>>    grants
>>>> Flink five DEDICATED servers that runs all the time only for
>>>    Flink, that'll
>>>> be PERFECT and can totally solve our problem now.
>>>> 
>>>> Please be aware of that we're not complaining about the lack of
>>>    resources
>>>> in general, I'm complaining about the lack of **stable, dedicated**
>>>> resources. An example for the latter one is, currently even if
>>>    no build is
>>>> in Flink's queue and I submit a request to be the queue head in PST
>>>> morning, my build won't even start in 6-8+h. That is an absurd
>>>    amount of
>>>> waiting time.
>>>> 
>>>> 
>>>> That's saying, if ASF INFRA decides to adopt a quota system and
>>>    grants
>>>> Flink five DEDICATED servers that runs all the time only for
>>>    Flink, that'll
>>>> be PERFECT and can totally solve our problem now.
>>>> 
>>>> I feel what's missing in the ASF INFRA's Travis resource pool is
>>>    some level
>>>> of build capacity SLAs and certainty"
>>>> 
>>>> 
>>>> Again, I believe there are differences in nature of these two
>>>    problems,
>>>> long build time v.s. lack of dedicated build resource. That's
>>>    saying,
>>>> shortening build time may relieve the situation, and may not.
>>>    I'm sightly
>>>> negative on disabling IT cases for PRs, due to the downside is
>>>    that we are
>>>> at risk of any potential bugs in PR that UTs doesn't catch, and
>>>    may cost a
>>>> lot more to fix and if it slows others down or even block
>>>    others, but am
>>>> open to others opinions on it.
>>>> 
>>>> AFAICT from INFRA ticket[1], donating to ASF INFRA won't be
>>>    feasible to
>>>> solve our problem since INFRA's pool is fully shared and they
>>>    have no
>>>> control and finer insights over resource allocation to a
>>>    specific Apache
>>>> project. As mentioned in [1], Apache Arrow is moving away from
>>>    ASF INFRA
>>>> Travis pool (they are actually surprised Flink hasn't plan to do
>>>    so). I
>>>> know that Spark is on its own build infra. If we all agree that
>>>    funding our
>>>> own build infra, I'd be glad to help investigate any potential
>>>    options
>>>> after releasing 1.9 since I'm super busy with 1.9 now.
>>>> 
>>>> [1] https://issues.apache.org/jira/browse/INFRA-18533
>>>> 
>>>> 
>>>> 
>>>> On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler
>>>    <ches...@apache.org <mailto:ches...@apache.org>> wrote:
>>>> 
>>>>> As a short-term stopgap, since we can assume this issue to
>>>    become much
>>>>> worse in the following days/weeks, we could disable IT cases in
>>>    PRs and
>>>>> only run them on master.
>>>>> 
>>>>> On 02/07/2019 12:03, Chesnay Schepler wrote:
>>>>>> People really have to stop thinking that just because
>>>    something works
>>>>>> for us it is also a good solution.
>>>>>> Also, please remember that our builds run for 2h from start to
>>>    finish,
>>>>>> and not the 14 _minutes_ it takes for zeppelin.
>>>>>> We are dealing with an entirely different scale here, both in
>>>    terms of
>>>>>> build times and number of builds.
>>>>>> 
>>>>>> In this very thread people have been complaining about long queue
>>>>>> times for their builds. Surprise, other Apache projects have been
>>>>>> suffering the very same thing due to us not controlling our build
>>>>>> times. While switching services (be it Jenkins, CircleCI or
>>>    whatever)
>>>>>> will possibly work for us (and these options are actually
>>>    attractive,
>>>>>> like CircleCI's proper support for build artifacts), it will also
>>>>>> result in us likely negatively affecting other projects in
>>>    significant
>>>>>> ways.
>>>>>> 
>>>>>> Sure, the Jenkins setup has a good user experience for us, at
>>>    the cost
>>>>>> of blocking Jenkins workers for a _lot_ of time. Right now we
>>>    have 25
>>>>>> PR's in our queue; that's possibly 50h we'd consume of Jenkins
>>>>>> resources, and the European contributors haven't even really
>>>    started yet.
>>>>>> 
>>>>>> FYI, the latest INFRA response from INFRA-18533:
>>>>>> 
>>>>>> "Our rough metrics shows that Flink used over 5800 hours of
>>>    build time
>>>>>> last month. That is equal to EIGHT servers running 24/7 for
>>>    the ENTIRE
>>>>>> MONTH. EIGHT. nonstop.
>>>>>> When we discovered this last night, we discussed it some and
>>>    are going
>>>>>> to tune down Flink to allow only five executors maximum. We
>> cannot
>>>>>> allow Flink to consume so much of a Foundation shared resource."
>>>>>> 
>>>>>> So yes, we either
>>>>>> a) have to heavily reduce our CI usage or
>>>>>> b) fund our own, either maintaining it ourselves or donating
>>>    to Apache.
>>>>>> 
>>>>>> On 02/07/2019 05:11, Bowen Li wrote:
>>>>>>> By looking at the git history of the Jenkins script, its core
>>>    part
>>>>>>> was finished in March 2017 (and only two minor update in
>>>    2017/2018),
>>>>>>> so it's been running for over two years now and feels like
>>>    Zepplin
>>>>>>> community has been quite happy with it. @Jeff Zhang
>>>>>>> <mailto:zjf...@gmail.com <mailto:zjf...@gmail.com>> can you
>>>    share your insights and user
>>>>>>> experience with the Jenkins+Travis approach?
>>>>>>> 
>>>>>>> Things like:
>>>>>>> 
>>>>>>> - has the approach completely solved the resource capacity
>>>    problem
>>>>>>> for Zepplin community? is Zepplin community happy with the
>>>    result?
>>>>>>> - is the whole configuration chain stable (e.g. uptime) enough?
>>>>>>> - how often do you need to maintain the Jenkins infra? how many
>>>>>>> people are usually involved in maintenance and bug-fixes?
>>>>>>> 
>>>>>>> The downside of this approach seems mostly to be on the
>>>    maintenance
>>>>>>> to me - maintain the script and Jenkins infra.
>>>>>>> 
>>>>>>> ** Having Our Own Travis-CI.com Account **
>>>>>>> 
>>>>>>> Another alternative I've been thinking of is to have our own
>>>>>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com>
>>>    account with paid dedicated
>>>>>>> resources. Note travis-ci.org <http://travis-ci.org>
>>>    <http://travis-ci.org> is the free
>>>>>>> version and travis-ci.com <http://travis-ci.com>
>>>    <http://travis-ci.com> is the commercial
>>>>>>> version. We currently use a shared resource pool managed by
>>>    ASK INFRA
>>>>>>> team on travis-ci.org <http://travis-ci.org>
>>>    <http://travis-ci.org>, but we have no control
>>>>>>> over it - we can't see how it's configured, how much
>>>    resources are
>>>>>>> available, how resources are allocated among Apache projects,
>>>    etc.
>>>>>>> The nice thing about having an account on travis-ci.com
>>>    <http://travis-ci.com>
>>>>>>> <http://travis-ci.com> are:
>>>>>>> 
>>>>>>> - relatively low cost with much better resource guarantee
>>>    than what
>>>>>>> we currently have [1]: $249/month with 5 dedicated concurrency,
>>>>>>> $489/month with 10 concurrency
>>>>>>> - low maintenance work compared to using Jenkins
>>>>>>> - (potentially) no migration cost according to Travis's doc [2]
>>>>>>> (pending verification)
>>>>>>> - full control over the build capacity/configuration compared to
>>>>>>> using ASF INFRA's pool
>>>>>>> 
>>>>>>> I'd be surprised if we as such a vibrant community cannot
>>>    find and
>>>>>>> fund $249*12=$2988 a year in exchange for a much better
>> developer
>>>>>>> experience and much higher productivity.
>>>>>>> 
>>>>>>> [1] https://travis-ci.com/plans
>>>>>>> [2]
>>>>>>> 
>>>>> 
>>> 
>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>>>>>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler
>>>    <ches...@apache.org <mailto:ches...@apache.org>
>>>>>>> <mailto:ches...@apache.org <mailto:ches...@apache.org>>> wrote:
>>>>>>> 
>>>>>>>     So yes, the Jenkins job keeps pulling the state from
>>>    Travis until it
>>>>>>>     finishes.
>>>>>>> 
>>>>>>>     Note sure I'm comfortable with the idea of using Jenkins
>>>    workers
>>>>>>>     just to
>>>>>>>     idle for a several hours.
>>>>>>> 
>>>>>>>     On 29/06/2019 14:56, Jeff Zhang wrote:
>>>>>>>> Here's what zeppelin community did, we make a python
>>>    script to
>>>>>>>     check the
>>>>>>>> build status of pull request.
>>>>>>>> Here's script:
>>>>>>>> 
>>>    https://github.com/apache/zeppelin/blob/master/travis_check.py
>>>>>>>> 
>>>>>>>> And this is the script we used in Jenkins build job.
>>>>>>>> 
>>>>>>>> if [ -f "travis_check.py" ]; then
>>>>>>>>   git log -n 1
>>>>>>>>   STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull
>>>>>>>     request.*from.*" | sed
>>>>>>>> 's/.*GitHub pull request <a
>>>>>>>> href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1
>>>    \2/g')
>>>>>>>>   AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
>>>>>>>>   PR=$(echo $STATUS | awk '{print $1}' | sed
>>>>>>> 's/.*[/]\(.*\)$/\1/g')
>>>>>>>>   #COMMIT=$(git log -n 1 | grep "^Merge:" | awk
>>>    '{print $3}')
>>>>>>>>   #if [ -z $COMMIT ]; then
>>>>>>>>   #  COMMIT=$(curl -s
>>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>>>>>>> | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" |
>>>    tr '\n' ' '
>>>>>>>     | sed
>>>>>>>> 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>>>    grep -v
>>>>>>>     "apache:" |
>>>>>>>> sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>>>>>>>   #fi
>>>>>>>> 
>>>>>>>>   # get commit hash from PR
>>>>>>>>   COMMIT=$(curl -s
>>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR |
>>>>>>>> grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr
>>>    '\n' ' '
>>>>>>> | sed
>>>>>>>> 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' |
>>>    grep -v
>>>>>>>     "apache:" |
>>>>>>>> sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
>>>>>>>>   sleep 30 # sleep few moment to wait travis starts
>>>    the build
>>>>>>>>   RET_CODE=0
>>>>>>>>   python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>>>    RET_CODE=$?
>>>>>>>>   if [ $RET_CODE -eq 2 ]; then # try with repository
>>>    name when
>>>>>>>     travis-ci is
>>>>>>>> not available in the account
>>>>>>>>     RET_CODE=0
>>>>>>>>     AUTHOR=$(curl -s
>>>>>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR
>>>>>>>> | grep '"full_name":' | grep -v "apache/zeppelin" | sed
>>>>>>>> 's/.*[:][^"]*["]\([^/]*\).*/\1/g')
>>>>>>>>   python ./travis_check.py ${AUTHOR} ${COMMIT} ||
>>>    RET_CODE=$?
>>>>>>>>   fi
>>>>>>>> 
>>>>>>>>   if [ $RET_CODE -eq 2 ]; then # fail with can't find
>>>    build
>>>>>>>     information in
>>>>>>>> the travis
>>>>>>>>     set +x
>>>>>>>>     echo
>>>    "-----------------------------------------------------"
>>>>>>>>     echo "Looks like travis-ci is not configured for
>>>    your fork."
>>>>>>>>     echo "Please setup by swich on 'zeppelin'
>>>    repository at
>>>>>>>> https://travis-ci.org/profile and travis-ci."
>>>>>>>>     echo "And then make sure 'Build branch updates'
>>>    option is
>>>>>>>     enabled in
>>>>>>>> the settings
>>>    https://travis-ci.org/${AUTHOR}/zeppelin/settings
>>>    <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>
>>>>>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>."
>>>>>>>>     echo ""
>>>>>>>>     echo "To trigger CI after setup, you will need
>>>    ammend your
>>>>>>>     last commit
>>>>>>>> with"
>>>>>>>>     echo "git commit --amend"
>>>>>>>>     echo "git push your-remote HEAD --force"
>>>>>>>>     echo ""
>>>>>>>>     echo "See
>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
>>>>>>>> ."
>>>>>>>>   fi
>>>>>>>> 
>>>>>>>>   exit $RET_CODE
>>>>>>>> else
>>>>>>>>   set +x
>>>>>>>>   echo "travis_check.py does not exists"
>>>>>>>>   exit 1
>>>>>>>> fi
>>>>>>>> 
>>>>>>>> Chesnay Schepler <ches...@apache.org
>>>    <mailto:ches...@apache.org>
>>>>>>>     <mailto:ches...@apache.org <mailto:ches...@apache.org>>>
>>>    于2019年6月29日周六 下午3:17写道：
>>>>>>>> 
>>>>>>>>> Does this imply that a Jenkins job is active as long
>>>    as the
>>>>>>>     Travis build
>>>>>>>>> runs?
>>>>>>>>> 
>>>>>>>>> On 26/06/2019 21:28, Bowen Li wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> @Dawid, I think the "long test running" as I
>>>    mentioned in the
>>>>>>>     first
>>>>>>>>> email,
>>>>>>>>>> also as you guys said, belongs to "a big effort
>>>    which is much
>>>>>>>     harder to
>>>>>>>>>> accomplish in a short period of time and may deserve
>>>    its own
>>>>>>>     separate
>>>>>>>>>> discussion". Thus I didn't include it in what we can
>>>    do in a
>>>>>>>     foreseeable
>>>>>>>>>> short term.
>>>>>>>>>> 
>>>>>>>>>> Besides, I don't think that's the ultimate reason
>>>    for lack of
>>>>>>>     build
>>>>>>>>>> resources. Even if the build is shortened to
>>>    something like
>>>>>>>     2h, the
>>>>>>>>>> problems of no build machine works about 6 or more
>>>    hours in
>>>>>>>     PST daytime
>>>>>>>>>> that I described will still happen, because no
>>>    machine from
>>>>>>>     ASF INFRA's
>>>>>>>>>> pool is allocated to Flink. As I have paid close
>>>    attention to
>>>>>>>     the build
>>>>>>>>>> queue in the past few weekdays, it's a pretty clear
>>>    pattern now.
>>>>>>>>>> 
>>>>>>>>>> **The ultimate root cause** for that is - we don't
>>>    have any
>>>>>>>     **dedicated**
>>>>>>>>>> build resources that we can stably rely on. I'm
>>>    actually ok to
>>>>>>>     wait for a
>>>>>>>>>> long time if there are build requests running, it
>>>    means at
>>>>>>>     least we are
>>>>>>>>>> making progress. But I'm not ok with no build
>>>    resource. A
>>>>>>>     better place I
>>>>>>>>>> think we should aim at in short term is to always
>>>    have at
>>>>>>>     least a central
>>>>>>>>>> pool (can be 3 or 5) of machines dedicated to build
>>>    Flink at
>>>>>>>     any time, or
>>>>>>>>>> maybe use users resources.
>>>>>>>>>> 
>>>>>>>>>> @Chesnay @Robert I synced with Jeff offline that
>>>    Zeppelin
>>>>>>>     community is
>>>>>>>>>> using a Jenkins job to automatically build on users'
>>>    travis
>>>>>>>     account and
>>>>>>>>>> link the result back to github PR. I guess the
>>>    Jenkins job
>>>>>>>     would fetch
>>>>>>>>>> latest upstream master and build the PR against it.
>>>    Jeff has
>>>>>>> filed
>>>>>>>>> tickets
>>>>>>>>>> to learn and get access to the Jenkins infra. It'll
>>>    better to
>>>>>>>     fully
>>>>>>>>>> understand it first before judging this approach.
>>>>>>>>>> 
>>>>>>>>>> I also heard good things about CircleCI, and ASF
>>>    INFRA seems
>>>>>>>     to have a
>>>>>>>>> pool
>>>>>>>>>> of build capacity there too. Can be an alternative
>>>    to consider.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
>>>>>>>>> dwysakow...@apache.org
>>>    <mailto:dwysakow...@apache.org> <mailto:dwysakow...@apache.org
>>>    <mailto:dwysakow...@apache.org>>>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Sorry to jump in late, but I think Bowen missed the
>>>    most
>>>>>>>     important point
>>>>>>>>>>> from Chesnay's previous message in the summary. The
>>>    ultimate
>>>>>>>     reason for
>>>>>>>>>>> all the problems is that the tests take close to 2
>>>    hours to
>>>>>>>     run already.
>>>>>>>>>>> I fully support this claim: "Unless people start
>>>    caring about
>>>>>>>     test times
>>>>>>>>>>> before adding them, this issue cannot be solved"
>>>>>>>>>>> 
>>>>>>>>>>> This is also another reason why using user's Travis
>>>    account
>>>>>>>     won't help.
>>>>>>>>>>> Every few weeks we reach the user's time limit for
>>>    a single
>>>>>>>     profile.
>>>>>>>>>>> This makes the user's builds simply fail, until we
>>>    either
>>>>>>>     properly
>>>>>>>>>>> decrease the time the tests take (which I am not
>>>    sure we ever
>>>>>>>     did) or
>>>>>>>>>>> postpone the problem by splitting into more
>>>    profiles. (Note
>>>>>>>     that the ASF
>>>>>>>>>>> Travis account has higher time limits)
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> 
>>>>>>>>>>> Dawid
>>>>>>>>>>> 
>>>>>>>>>>> On 26/06/2019 09:36, Robert Metzger wrote:
>>>>>>>>>>>> Do we know if using "the best" available hardware
>>>    would
>>>>>>>     improve the
>>>>>>>>> build
>>>>>>>>>>>> times?
>>>>>>>>>>>> Imagine we would run the build on machines with
>>>    plenty of
>>>>>>>     main memory
>>>>>>>>> to
>>>>>>>>>>>> mount everything to ramdisk + the latest CPU
>>>    architecture?
>>>>>>>>>>>> 
>>>>>>>>>>>> Throwing hardware at the problem could help reduce
>>>    the time
>>>>>>>     of an
>>>>>>>>>>>> individual build, and using our own infrastructure
>>>    would
>>>>>>>     remove our
>>>>>>>>>>>> dependency on Apache's Travis account (with the
>>>    obvious
>>>>>>>     downside of
>>>>>>>>>>> having
>>>>>>>>>>>> to maintain the infrastructure)
>>>>>>>>>>>> We could use an open source travis alternative, to
>>>    have a
>>>>>>>     similar
>>>>>>>>>>>> experience and make the migration easy.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
>>>>>>>     <ches...@apache.org <mailto:ches...@apache.org>
>>>    <mailto:ches...@apache.org <mailto:ches...@apache.org>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> From what I gathered, there's no special
>>>    sauce that the
>>>>>>>     Zeppelin
>>>>>>>>>>>>> project uses which actually integrates a users
>> Travis
>>>>>>>     account into the
>>>>>>>>>>> PR.
>>>>>>>>>>>>> They just disabled Travis for PRs. And that's
>>>    kind of it.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Naturally we can do this (duh) and safe the ASF a
>>>    fair
>>>>>>>     amount of
>>>>>>>>>>>>> resources, but there are downsides:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The discoverability of the Travis check takes a
>>>    nose-dive.
>>>>>>>     Either we
>>>>>>>>>>>>> require every contributor to always, an every
>>>    commit, also
>>>>>>>     post a
>>>>>>>>> Travis
>>>>>>>>>>>>> build, or we have the reviewer sift through the
>>>>>>>     contributors account
>>>>>>>>> to
>>>>>>>>>>>>> find it.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This is rather cumbersome. Additionally, it's
>>>    also not
>>>>>>>     equivalent to
>>>>>>>>>>>>> having a PR build.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> A normal branch build takes a branch as is and
>>>    tests it. A
>>>>>>>     PR build
>>>>>>>>>>>>> merges the branch into master, and then runs it.
>>>    (Fun fact:
>>>>>>>     This is
>>>>>>>>> why
>>>>>>>>>>>>> a PR without merge conflicts is not being run on
>>>    Travis.)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> And ultimately, everyone can already make use of
>> this
>>>>>>>     approach anyway.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 25/06/2019 08:02, Jark Wu wrote:
>>>>>>>>>>>>>> Hi Jeff,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks for sharing the Zeppelin approach. I
>>>    think it's a
>>>>>>>     good idea to
>>>>>>>>>>>>>> leverage user's travis account.
>>>>>>>>>>>>>> In this way, we can have almost unlimited
>>>    concurrent build
>>>>>>>     jobs and
>>>>>>>>>>>>>> developers can restart build by themselves
>>>    (currently only
>>>>>>>     committers
>>>>>>>>>>>>>> can restart PR's build).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> But I'm still not very clear how to integrate
>> user's
>>>>>>>     travis build
>>>>>>>>> into
>>>>>>>>>>>>>> the Flink pull request's build automatically.
>>>    Can you
>>>>>>>     explain more in
>>>>>>>>>>>>>> detail?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Another question: does travis only build
>>>    branches for user
>>>>>>>     account?
>>>>>>>>>>>>>> My concern is that builds for PRs will rebase
>> user's
>>>>>>>     commits against
>>>>>>>>>>>>>> current master branch.
>>>>>>>>>>>>>> This will help us to find problems before
>>>    merge.  Builds
>>>>>>>     for branches
>>>>>>>>>>>>>> will lose the impact of new commits in master.
>>>>>>>>>>>>>> How does Zeppelin solve this problem?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks again for sharing the idea.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
>>>    <zjf...@gmail.com <mailto:zjf...@gmail.com>
>>>>>>>     <mailto:zjf...@gmail.com <mailto:zjf...@gmail.com>>
>>>>>>>>>>>>>> <mailto:zjf...@gmail.com
>>>    <mailto:zjf...@gmail.com> <mailto:zjf...@gmail.com
>>>    <mailto:zjf...@gmail.com>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>      Hi Folks,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Zeppelin meet this kind of issue before, we solve
>>>>>>> it by
>>>>>>>>> delegating
>>>>>>>>>>>>>>      each
>>>>>>>>>>>>>>      one's PR build to his travis account
>>>    (Everyone can
>>>>>>>     have 5 free
>>>>>>>>>>>>>>      slot for
>>>>>>>>>>>>>> travis build).
>>>>>>>>>>>>>> Apache account travis build is only triggered when
>>>>>>>     PR is merged.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>      Kurt Young <ykt...@gmail.com
>>>    <mailto:ykt...@gmail.com>
>>>>>>>     <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com>>
>>>    <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com>
>>>>>>>     <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com>>>>
>>>>>>>>>>>>>> 于2019年6月25日周二 上午10:16写道：
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> (Forgot to cc George)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Kurt
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
>>>>>>>     <ykt...@gmail.com <mailto:ykt...@gmail.com>
>>>    <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com>>
>>>>>>>>>>>>>> <mailto:ykt...@gmail.com
>>>    <mailto:ykt...@gmail.com> <mailto:ykt...@gmail.com
>>>    <mailto:ykt...@gmail.com>>>>
>>>>>>>     wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi Bowen,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks for bringing this up. We
>>>    actually have
>>>>>>>     discussed
>>>>>>>>> about
>>>>>>>>>>>>>>      this, and I
>>>>>>>>>>>>>>>> think Till and George have
>>>>>>>>>>>>>>>> already spend sometime investigating
>>>    it. I have
>>>>>>>     cced both of
>>>>>>>>>>>>>>      them, and
>>>>>>>>>>>>>>>> maybe they can share
>>>>>>>>>>>>>>>> their findings.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Kurt
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
>>>>>>>     <imj...@gmail.com <mailto:imj...@gmail.com>
>>>    <mailto:imj...@gmail.com <mailto:imj...@gmail.com>>
>>>>>>>>>>>>>> <mailto:imj...@gmail.com
>>>    <mailto:imj...@gmail.com> <mailto:imj...@gmail.com
>>>    <mailto:imj...@gmail.com>>>>
>>>>>>>     wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi Bowen,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks for bringing this. We also
>>>    suffered from
>>>>>>>     the long
>>>>>>>>>>>>>>      build time.
>>>>>>>>>>>>>>>>> I agree that we should focus on
>>>    solving build
>>>>>>>     capacity
>>>>>>>>>>>>>> problem in the
>>>>>>>>>>>>>>>>> thread.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> My observation is there is only one
>>>    build is
>>>>>>>     running, all
>>>>>>>>> the
>>>>>>>>>>>>>> others
>>>>>>>>>>>>>>>>> (other
>>>>>>>>>>>>>>>>> PRs, master) are pending.
>>>>>>>>>>>>>>>>> The pricing plan[1] of travis shows
>>>    it can
>>>>>>> support
>>>>>>>>> concurrent
>>>>>>>>>>>>>>      build
>>>>>>>>>>>>>>> jobs.
>>>>>>>>>>>>>>>>> But I don't know which plan we are
>>>    using, might
>>>>>>>     be the free
>>>>>>>>>>>>>>      plan for
>>>>>>>>>>>>>>> open
>>>>>>>>>>>>>>>>> source.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I cc-ed Chesnay who may have some
>>>    experience on
>>>>>>>     Travis.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> [1]: https://travis-ci.com/plans
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
>>>>>>>>> bowenl...@gmail.com <mailto:bowenl...@gmail.com>
>>>    <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>>
>>>>>>>>>>>>>> <mailto:bowenl...@gmail.com
>>>    <mailto:bowenl...@gmail.com>
>>>>>>>     <mailto:bowenl...@gmail.com
>>>    <mailto:bowenl...@gmail.com>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hi Steven,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I think you may not read what I
>>>    wrote. The
>>>>>>>     discussion is
>>>>>>>>>>> about
>>>>>>>>>>>>>>> "unstable
>>>>>>>>>>>>>>>>>> build **capacity**", in another word
>>>>>>>     "unstable / lack of
>>>>>>>>>>> build
>>>>>>>>>>>>>>>>> resources",
>>>>>>>>>>>>>>>>>> not "unstable build".
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 4:40 PM
>>>    Steven Wu
>>>>>>>>>>>>>>      <stevenz...@gmail.com
>>>    <mailto:stevenz...@gmail.com> <mailto:stevenz...@gmail.com
>>>    <mailto:stevenz...@gmail.com>>
>>>>>>>     <mailto:stevenz...@gmail.com
>>>    <mailto:stevenz...@gmail.com> <mailto:stevenz...@gmail.com
>>>    <mailto:stevenz...@gmail.com>>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> long and sometimes unstable build is
>>>>>>>     definitely a pain
>>>>>>>>>>>>> point.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I suspect the build failure here in
>>>>>>>>> flink-connector-kafka
>>>>>>>>>>>>>>      is not
>>>>>>>>>>>>>>>>> related
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> my change. but there is no easy
>>>    re-run the
>>>>>>>     build on
>>>>>>>>>>>>>> travis UI.
>>>>>>>>>>>>>>> Google
>>>>>>>>>>>>>>>>>>> search showed a trick of
>>>    close-and-open the
>>>>>>>     PR will
>>>>>>>>>>>>>> trigger rebuild.
>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>> that could add noises to the PR
>>>    activities.
>>>>>>>>>>>>>>>>>>> 
>>>>>>> https://travis-ci.org/apache/flink/jobs/545555519
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> travis-ci for my personal repo
>>>    often failed
>>>>>>>     with
>>>>>>>>>>>>>> exceeding time
>>>>>>>>>>>>>>> limit
>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>> 4+ hours.
>>>>>>>>>>>>>>>>>>> The job exceeded the maximum time
>>>    limit for
>>>>>>>     jobs, and
>>>>>>>>> has
>>>>>>>>>>>>>>      been
>>>>>>>>>>>>>>>>>> terminated.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 4:15 PM
>>>    Bowen Li
>>>>>>>>>>>>>>      <bowenl...@gmail.com
>>>    <mailto:bowenl...@gmail.com> <mailto:bowenl...@gmail.com
>>>    <mailto:bowenl...@gmail.com>>
>>>>>>>     <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>
>>>    <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>> https://travis-ci.org/apache/flink/builds/549681530
>>>>>>>>>>>>>>      This build
>>>>>>>>>>>>>>>>>> request
>>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>> been sitting at **HEAD of the
>>>    queue**
>>>>>>>     since I first
>>>>>>>>> saw
>>>>>>>>>>>>>>      it at PST
>>>>>>>>>>>>>>>>>> 10:30am
>>>>>>>>>>>>>>>>>>>> (not sure how long it's been
>>>    there before
>>>>>>>     10:30am).
>>>>>>>>>>>>>>      It's PST
>>>>>>>>>>>>>>> 4:12pm
>>>>>>>>>>>>>>>>> now
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> it hasn't started yet.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 2:48 PM
>>>    Bowen Li
>>>>>>>>>>>>>>      <bowenl...@gmail.com
>>>    <mailto:bowenl...@gmail.com> <mailto:bowenl...@gmail.com
>>>    <mailto:bowenl...@gmail.com>>
>>>>>>>     <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>
>>>    <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>>>>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Hi devs,
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I've been experiencing the pain
>>>>>>>     resulting from lack
>>>>>>>>>>>>>>      of stable
>>>>>>>>>>>>>>>>> build
>>>>>>>>>>>>>>>>>>>>> capacity on Travis for Flink
>>>    PRs [1].
>>>>>>>>> Specifically, I
>>>>>>>>>>>>>> noticed
>>>>>>>>>>>>>>>>> often
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>>> build in the queue is making any
>>>>>>>     progress for
>>>>>>>>> hours,
>>>>>>>>>>> and
>>>>>>>>>>>>>>> suddenly
>>>>>>>>>>>>>>>>> 5
>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>> 6
>>>>>>>>>>>>>>>>>>>>> builds kick off all together
>>>    after the
>>>>>>>     long pause.
>>>>>>>>>>>>>>      I'm at PST
>>>>>>>>>>>>>>>>>> (UTC-08)
>>>>>>>>>>>>>>>>>>>> time
>>>>>>>>>>>>>>>>>>>>> zone, and I've seen pause can
>>>    be as
>>>>>>>     long as 6 hours
>>>>>>>>>>>>>>      from PST 9am
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> 3pm
>>>>>>>>>>>>>>>>>>>>> (let alone the time needed to
>>>    drain the
>>>>>>>     queue
>>>>>>>>>>>>>> afterwards).
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I think this has greatly
>>>    impacted our
>>>>>>>     productivity.
>>>>>>>>>>> I've
>>>>>>>>>>>>>>>>> experienced
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> PRs submitted in the early
>>>    morning of
>>>>>>>     PST time zone
>>>>>>>>>>>>>>      won't finish
>>>>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>>>>>>>> build until late night of the
>>>    same day.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> So my questions are:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> - Has anyone else experienced
>>>    the same
>>>>>>>     problem or
>>>>>>>>>>>>>>      have similar
>>>>>>>>>>>>>>>>>>>> observation
>>>>>>>>>>>>>>>>>>>>> on TravisCI? (I suspect it
>>>    has things
>>>>>>>     to do with
>>>>>>>>> time
>>>>>>>>>>>>>>      zone)
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> - What pricing plan of
>>>    TravisCI is
>>>>>>>     Flink currently
>>>>>>>>>>>>>> using? Is it
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> free
>>>>>>>>>>>>>>>>>>>>> plan for open source
>>>    projects? What
>>>>>>> are the
>>>>>>>>>>>>>> guaranteed build
>>>>>>>>>>>>>>>>> capacity
>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>> the current plan?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> - If the current pricing plan
>>>    (either
>>>>>>>     free or paid)
>>>>>>>>>>>>> can't
>>>>>>>>>>>>>>> provide
>>>>>>>>>>>>>>>>>>> stable
>>>>>>>>>>>>>>>>>>>>> build capacity, can we
>>>    upgrade to a
>>>>>>>     higher priced
>>>>>>>>>>>>>>      plan with
>>>>>>>>>>>>>>> larger
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>>>>> stable build capacity?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> BTW, another factor that
>>>    contribute to
>>>>>>> the
>>>>>>>>>>>>>> productivity problem
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> our build is slow - we run
>>>    full build
>>>>>>>     for every PR
>>>>>>>>>>> and a
>>>>>>>>>>>>>>>>> successful
>>>>>>>>>>>>>>>>>>> full
>>>>>>>>>>>>>>>>>>>>> build takes ~5h. We
>>>    definitely have
>>>>>>>     more options to
>>>>>>>>>>>>>>      solve it,
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>> instance,
>>>>>>>>>>>>>>>>>>>>> modularize the build graphs
>>>    and reuse
>>>>>>>     artifacts
>>>>>>>>> from
>>>>>>>>>>> the
>>>>>>>>>>>>>>> previous
>>>>>>>>>>>>>>>>>>> build.
>>>>>>>>>>>>>>>>>>>>> But I think that can be a big
>>>    effort
>>>>>>>     which is much
>>>>>>>>>>>>>> harder to
>>>>>>>>>>>>>>>>>> accomplish
>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>> a short period of time and
>>>    may deserve
>>>>>>>     its own
>>>>>>>>>>> separate
>>>>>>>>>>>>>>>>> discussion.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>> https://travis-ci.org/apache/flink/pull_requests
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>      --
>>>>>>>>>>>>>>      Best Regards
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>      Jeff Zhang
>>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> 
>>

Re: [VOTE] Migrate to sponsored Travis account

Reply via email to