> ."
> fi
>
> exit $RET_CODE
> else
> set +x
> echo "travis_check.py does not exists"
> exit 1
> fi
>
> Chesnay Schepler <ches...@apache.org
<mailto:ches...@apache.org>> 于2019年6月29日周六 下午3:17写道:
>
>> Does this imply that a Jenkins job is active as long as the
Travis build
>> runs?
>>
>> On 26/06/2019 21:28, Bowen Li wrote:
>>> Hi,
>>>
>>> @Dawid, I think the "long test running" as I mentioned in the
first
>> email,
>>> also as you guys said, belongs to "a big effort which is much
harder to
>>> accomplish in a short period of time and may deserve its own
separate
>>> discussion". Thus I didn't include it in what we can do in a
foreseeable
>>> short term.
>>>
>>> Besides, I don't think that's the ultimate reason for lack of
build
>>> resources. Even if the build is shortened to something like
2h, the
>>> problems of no build machine works about 6 or more hours in
PST daytime
>>> that I described will still happen, because no machine from
ASF INFRA's
>>> pool is allocated to Flink. As I have paid close attention to
the build
>>> queue in the past few weekdays, it's a pretty clear pattern now.
>>>
>>> **The ultimate root cause** for that is - we don't have any
**dedicated**
>>> build resources that we can stably rely on. I'm actually ok to
wait for a
>>> long time if there are build requests running, it means at
least we are
>>> making progress. But I'm not ok with no build resource. A
better place I
>>> think we should aim at in short term is to always have at
least a central
>>> pool (can be 3 or 5) of machines dedicated to build Flink at
any time, or
>>> maybe use users resources.
>>>
>>> @Chesnay @Robert I synced with Jeff offline that Zeppelin
community is
>>> using a Jenkins job to automatically build on users' travis
account and
>>> link the result back to github PR. I guess the Jenkins job
would fetch
>>> latest upstream master and build the PR against it. Jeff has
filed
>> tickets
>>> to learn and get access to the Jenkins infra. It'll better to
fully
>>> understand it first before judging this approach.
>>>
>>> I also heard good things about CircleCI, and ASF INFRA seems
to have a
>> pool
>>> of build capacity there too. Can be an alternative to consider.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
>> dwysakow...@apache.org <mailto:dwysakow...@apache.org>>
>>> wrote:
>>>
>>>> Sorry to jump in late, but I think Bowen missed the most
important point
>>>> from Chesnay's previous message in the summary. The ultimate
reason for
>>>> all the problems is that the tests take close to 2 hours to
run already.
>>>> I fully support this claim: "Unless people start caring about
test times
>>>> before adding them, this issue cannot be solved"
>>>>
>>>> This is also another reason why using user's Travis account
won't help.
>>>> Every few weeks we reach the user's time limit for a single
profile.
>>>> This makes the user's builds simply fail, until we either
properly
>>>> decrease the time the tests take (which I am not sure we ever
did) or
>>>> postpone the problem by splitting into more profiles. (Note
that the ASF
>>>> Travis account has higher time limits)
>>>>
>>>> Best,
>>>>
>>>> Dawid
>>>>
>>>> On 26/06/2019 09:36, Robert Metzger wrote:
>>>>> Do we know if using "the best" available hardware would
improve the
>> build
>>>>> times?
>>>>> Imagine we would run the build on machines with plenty of
main memory
>> to
>>>>> mount everything to ramdisk + the latest CPU architecture?
>>>>>
>>>>> Throwing hardware at the problem could help reduce the time
of an
>>>>> individual build, and using our own infrastructure would
remove our
>>>>> dependency on Apache's Travis account (with the obvious
downside of
>>>> having
>>>>> to maintain the infrastructure)
>>>>> We could use an open source travis alternative, to have a
similar
>>>>> experience and make the migration easy.
>>>>>
>>>>>
>>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
<ches...@apache.org <mailto:ches...@apache.org>>
>>>> wrote:
>>>>>> From what I gathered, there's no special sauce that the
Zeppelin
>>>>>> project uses which actually integrates a users Travis
account into the
>>>> PR.
>>>>>> They just disabled Travis for PRs. And that's kind of it.
>>>>>>
>>>>>> Naturally we can do this (duh) and safe the ASF a fair
amount of
>>>>>> resources, but there are downsides:
>>>>>>
>>>>>> The discoverability of the Travis check takes a nose-dive.
Either we
>>>>>> require every contributor to always, an every commit, also
post a
>> Travis
>>>>>> build, or we have the reviewer sift through the
contributors account
>> to
>>>>>> find it.
>>>>>>
>>>>>> This is rather cumbersome. Additionally, it's also not
equivalent to
>>>>>> having a PR build.
>>>>>>
>>>>>> A normal branch build takes a branch as is and tests it. A
PR build
>>>>>> merges the branch into master, and then runs it. (Fun fact:
This is
>> why
>>>>>> a PR without merge conflicts is not being run on Travis.)
>>>>>>
>>>>>> And ultimately, everyone can already make use of this
approach anyway.
>>>>>>
>>>>>> On 25/06/2019 08:02, Jark Wu wrote:
>>>>>>> Hi Jeff,
>>>>>>>
>>>>>>> Thanks for sharing the Zeppelin approach. I think it's a
good idea to
>>>>>>> leverage user's travis account.
>>>>>>> In this way, we can have almost unlimited concurrent build
jobs and
>>>>>>> developers can restart build by themselves (currently only
committers
>>>>>>> can restart PR's build).
>>>>>>>
>>>>>>> But I'm still not very clear how to integrate user's
travis build
>> into
>>>>>>> the Flink pull request's build automatically. Can you
explain more in
>>>>>>> detail?
>>>>>>>
>>>>>>> Another question: does travis only build branches for user
account?
>>>>>>> My concern is that builds for PRs will rebase user's
commits against
>>>>>>> current master branch.
>>>>>>> This will help us to find problems before merge. Builds
for branches
>>>>>>> will lose the impact of new commits in master.
>>>>>>> How does Zeppelin solve this problem?
>>>>>>>
>>>>>>> Thanks again for sharing the idea.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Jark
>>>>>>>
>>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang <zjf...@gmail.com
<mailto:zjf...@gmail.com>
>>>>>>> <mailto:zjf...@gmail.com <mailto:zjf...@gmail.com>>> wrote:
>>>>>>>
>>>>>>> Hi Folks,
>>>>>>>
>>>>>>> Zeppelin meet this kind of issue before, we solve
it by
>> delegating
>>>>>>> each
>>>>>>> one's PR build to his travis account (Everyone can
have 5 free
>>>>>>> slot for
>>>>>>> travis build).
>>>>>>> Apache account travis build is only triggered when
PR is merged.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Kurt Young <ykt...@gmail.com
<mailto:ykt...@gmail.com> <mailto:ykt...@gmail.com
<mailto:ykt...@gmail.com>>>
>>>>>>> 于2019年6月25日周二 上午10:16写道:
>>>>>>>
>>>>>>> > (Forgot to cc George)
>>>>>>> >
>>>>>>> > Best,
>>>>>>> > Kurt
>>>>>>> >
>>>>>>> >
>>>>>>> > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
<ykt...@gmail.com <mailto:ykt...@gmail.com>
>>>>>>> <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com>>>
wrote:
>>>>>>> >
>>>>>>> > > Hi Bowen,
>>>>>>> > >
>>>>>>> > > Thanks for bringing this up. We actually have
discussed
>> about
>>>>>>> this, and I
>>>>>>> > > think Till and George have
>>>>>>> > > already spend sometime investigating it. I have
cced both of
>>>>>>> them, and
>>>>>>> > > maybe they can share
>>>>>>> > > their findings.
>>>>>>> > >
>>>>>>> > > Best,
>>>>>>> > > Kurt
>>>>>>> > >
>>>>>>> > >
>>>>>>> > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
<imj...@gmail.com <mailto:imj...@gmail.com>
>>>>>>> <mailto:imj...@gmail.com <mailto:imj...@gmail.com>>>
wrote:
>>>>>>> > >
>>>>>>> > >> Hi Bowen,
>>>>>>> > >>
>>>>>>> > >> Thanks for bringing this. We also suffered from
the long
>>>>>>> build time.
>>>>>>> > >> I agree that we should focus on solving build
capacity
>>>>>>> problem in the
>>>>>>> > >> thread.
>>>>>>> > >>
>>>>>>> > >> My observation is there is only one build is
running, all
>> the
>>>>>>> others
>>>>>>> > >> (other
>>>>>>> > >> PRs, master) are pending.
>>>>>>> > >> The pricing plan[1] of travis shows it can
support
>> concurrent
>>>>>>> build
>>>>>>> > jobs.
>>>>>>> > >> But I don't know which plan we are using, might
be the free
>>>>>>> plan for
>>>>>>> > open
>>>>>>> > >> source.
>>>>>>> > >>
>>>>>>> > >> I cc-ed Chesnay who may have some experience on
Travis.
>>>>>>> > >>
>>>>>>> > >> Regards,
>>>>>>> > >> Jark
>>>>>>> > >>
>>>>>>> > >> [1]: https://travis-ci.com/plans
>>>>>>> > >>
>>>>>>> > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
>> bowenl...@gmail.com <mailto:bowenl...@gmail.com>
>>>>>>> <mailto:bowenl...@gmail.com
<mailto:bowenl...@gmail.com>>> wrote:
>>>>>>> > >>
>>>>>>> > >> > Hi Steven,
>>>>>>> > >> >
>>>>>>> > >> > I think you may not read what I wrote. The
discussion is
>>>> about
>>>>>>> > "unstable
>>>>>>> > >> > build **capacity**", in another word
"unstable / lack of
>>>> build
>>>>>>> > >> resources",
>>>>>>> > >> > not "unstable build".
>>>>>>> > >> >
>>>>>>> > >> > On Mon, Jun 24, 2019 at 4:40 PM Steven Wu
>>>>>>> <stevenz...@gmail.com <mailto:stevenz...@gmail.com>
<mailto:stevenz...@gmail.com <mailto:stevenz...@gmail.com>>>
>>>>>>> > wrote:
>>>>>>> > >> >
>>>>>>> > >> > > long and sometimes unstable build is
definitely a pain
>>>>>> point.
>>>>>>> > >> > >
>>>>>>> > >> > > I suspect the build failure here in
>> flink-connector-kafka
>>>>>>> is not
>>>>>>> > >> related
>>>>>>> > >> > to
>>>>>>> > >> > > my change. but there is no easy re-run the
build on
>>>>>>> travis UI.
>>>>>>> > Google
>>>>>>> > >> > > search showed a trick of close-and-open the
PR will
>>>>>>> trigger rebuild.
>>>>>>> > >> but
>>>>>>> > >> > > that could add noises to the PR activities.
>>>>>>> > >> > >
https://travis-ci.org/apache/flink/jobs/545555519
>>>>>>> > >> > >
>>>>>>> > >> > > travis-ci for my personal repo often failed
with
>>>>>>> exceeding time
>>>>>>> > limit
>>>>>>> > >> > after
>>>>>>> > >> > > 4+ hours.
>>>>>>> > >> > > The job exceeded the maximum time limit for
jobs, and
>> has
>>>>>>> been
>>>>>>> > >> > terminated.
>>>>>>> > >> > >
>>>>>>> > >> > > On Mon, Jun 24, 2019 at 4:15 PM Bowen Li
>>>>>>> <bowenl...@gmail.com <mailto:bowenl...@gmail.com>
<mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>>>
>>>>>>> > wrote:
>>>>>>> > >> > >
>>>>>>> > >> > > >
https://travis-ci.org/apache/flink/builds/549681530
>>>>>>> This build
>>>>>>> > >> > request
>>>>>>> > >> > > > has
>>>>>>> > >> > > > been sitting at **HEAD of the queue**
since I first
>> saw
>>>>>>> it at PST
>>>>>>> > >> > 10:30am
>>>>>>> > >> > > > (not sure how long it's been there before
10:30am).
>>>>>>> It's PST
>>>>>>> > 4:12pm
>>>>>>> > >> now
>>>>>>> > >> > > and
>>>>>>> > >> > > > it hasn't started yet.
>>>>>>> > >> > > >
>>>>>>> > >> > > > On Mon, Jun 24, 2019 at 2:48 PM Bowen Li
>>>>>>> <bowenl...@gmail.com <mailto:bowenl...@gmail.com>
<mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>>>
>>>>>>> > >> wrote:
>>>>>>> > >> > > >
>>>>>>> > >> > > > > Hi devs,
>>>>>>> > >> > > > >
>>>>>>> > >> > > > > I've been experiencing the pain
resulting from lack
>>>>>>> of stable
>>>>>>> > >> build
>>>>>>> > >> > > > > capacity on Travis for Flink PRs [1].
>> Specifically, I
>>>>>>> noticed
>>>>>>> > >> often
>>>>>>> > >> > > that
>>>>>>> > >> > > > no
>>>>>>> > >> > > > > build in the queue is making any
progress for
>> hours,
>>>> and
>>>>>>> > suddenly
>>>>>>> > >> 5
>>>>>>> > >> > or
>>>>>>> > >> > > 6
>>>>>>> > >> > > > > builds kick off all together after the
long pause.
>>>>>>> I'm at PST
>>>>>>> > >> > (UTC-08)
>>>>>>> > >> > > > time
>>>>>>> > >> > > > > zone, and I've seen pause can be as
long as 6 hours
>>>>>>> from PST 9am
>>>>>>> > >> to
>>>>>>> > >> > 3pm
>>>>>>> > >> > > > > (let alone the time needed to drain the
queue
>>>>>>> afterwards).
>>>>>>> > >> > > > >
>>>>>>> > >> > > > > I think this has greatly impacted our
productivity.
>>>> I've
>>>>>>> > >> experienced
>>>>>>> > >> > > that
>>>>>>> > >> > > > > PRs submitted in the early morning of
PST time zone
>>>>>>> won't finish
>>>>>>> > >> > their
>>>>>>> > >> > > > > build until late night of the same day.
>>>>>>> > >> > > > >
>>>>>>> > >> > > > > So my questions are:
>>>>>>> > >> > > > >
>>>>>>> > >> > > > > - Has anyone else experienced the same
problem or
>>>>>>> have similar
>>>>>>> > >> > > > observation
>>>>>>> > >> > > > > on TravisCI? (I suspect it has things
to do with
>> time
>>>>>>> zone)
>>>>>>> > >> > > > >
>>>>>>> > >> > > > > - What pricing plan of TravisCI is
Flink currently
>>>>>>> using? Is it
>>>>>>> > >> the
>>>>>>> > >> > > free
>>>>>>> > >> > > > > plan for open source projects? What
are the
>>>>>>> guaranteed build
>>>>>>> > >> capacity
>>>>>>> > >> > > of
>>>>>>> > >> > > > > the current plan?
>>>>>>> > >> > > > >
>>>>>>> > >> > > > > - If the current pricing plan (either
free or paid)
>>>>>> can't
>>>>>>> > provide
>>>>>>> > >> > > stable
>>>>>>> > >> > > > > build capacity, can we upgrade to a
higher priced
>>>>>>> plan with
>>>>>>> > larger
>>>>>>> > >> > and
>>>>>>> > >> > > > more
>>>>>>> > >> > > > > stable build capacity?
>>>>>>> > >> > > > >
>>>>>>> > >> > > > > BTW, another factor that contribute to
the
>>>>>>> productivity problem
>>>>>>> > is
>>>>>>> > >> > that
>>>>>>> > >> > > > > our build is slow - we run full build
for every PR
>>>> and a
>>>>>>> > >> successful
>>>>>>> > >> > > full
>>>>>>> > >> > > > > build takes ~5h. We definitely have
more options to
>>>>>>> solve it,
>>>>>>> > for
>>>>>>> > >> > > > instance,
>>>>>>> > >> > > > > modularize the build graphs and reuse
artifacts
>> from
>>>> the
>>>>>>> > previous
>>>>>>> > >> > > build.
>>>>>>> > >> > > > > But I think that can be a big effort
which is much
>>>>>>> harder to
>>>>>>> > >> > accomplish
>>>>>>> > >> > > > in
>>>>>>> > >> > > > > a short period of time and may deserve
its own
>>>> separate
>>>>>>> > >> discussion.
>>>>>>> > >> > > > >
>>>>>>> > >> > > > > [1]
>> https://travis-ci.org/apache/flink/pull_requests
>>>>>>> > >> > > > >
>>>>>>> > >> > > > >
>>>>>>> > >> > > >
>>>>>>> > >> > >
>>>>>>> > >> >
>>>>>>> > >>
>>>>>>> > >
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards
>>>>>>>
>>>>>>> Jeff Zhang
>>>>>>>
>>