>>>> > ."
>>>> > fi
>>>> >
>>>> > exit $RET_CODE
>>>> > else
>>>> > set +x
>>>> > echo "travis_check.py does not exists"
>>>> > exit 1
>>>> > fi
>>>> >
>>>> > Chesnay Schepler <ches...@apache.org
<mailto:ches...@apache.org>
>>>> <mailto:ches...@apache.org <mailto:ches...@apache.org>>>
于2019年6月29日周六 下午3:17写道:
>>>> >
>>>> >> Does this imply that a Jenkins job is active as long
as the
>>>> Travis build
>>>> >> runs?
>>>> >>
>>>> >> On 26/06/2019 21:28, Bowen Li wrote:
>>>> >>> Hi,
>>>> >>>
>>>> >>> @Dawid, I think the "long test running" as I
mentioned in the
>>>> first
>>>> >> email,
>>>> >>> also as you guys said, belongs to "a big effort
which is much
>>>> harder to
>>>> >>> accomplish in a short period of time and may deserve
its own
>>>> separate
>>>> >>> discussion". Thus I didn't include it in what we can
do in a
>>>> foreseeable
>>>> >>> short term.
>>>> >>>
>>>> >>> Besides, I don't think that's the ultimate reason
for lack of
>>>> build
>>>> >>> resources. Even if the build is shortened to
something like
>>>> 2h, the
>>>> >>> problems of no build machine works about 6 or more
hours in
>>>> PST daytime
>>>> >>> that I described will still happen, because no
machine from
>>>> ASF INFRA's
>>>> >>> pool is allocated to Flink. As I have paid close
attention to
>>>> the build
>>>> >>> queue in the past few weekdays, it's a pretty clear
pattern now.
>>>> >>>
>>>> >>> **The ultimate root cause** for that is - we don't
have any
>>>> **dedicated**
>>>> >>> build resources that we can stably rely on. I'm
actually ok to
>>>> wait for a
>>>> >>> long time if there are build requests running, it
means at
>>>> least we are
>>>> >>> making progress. But I'm not ok with no build
resource. A
>>>> better place I
>>>> >>> think we should aim at in short term is to always
have at
>>>> least a central
>>>> >>> pool (can be 3 or 5) of machines dedicated to build
Flink at
>>>> any time, or
>>>> >>> maybe use users resources.
>>>> >>>
>>>> >>> @Chesnay @Robert I synced with Jeff offline that
Zeppelin
>>>> community is
>>>> >>> using a Jenkins job to automatically build on users'
travis
>>>> account and
>>>> >>> link the result back to github PR. I guess the
Jenkins job
>>>> would fetch
>>>> >>> latest upstream master and build the PR against it.
Jeff has
>>>> filed
>>>> >> tickets
>>>> >>> to learn and get access to the Jenkins infra. It'll
better to
>>>> fully
>>>> >>> understand it first before judging this approach.
>>>> >>>
>>>> >>> I also heard good things about CircleCI, and ASF
INFRA seems
>>>> to have a
>>>> >> pool
>>>> >>> of build capacity there too. Can be an alternative
to consider.
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
>>>> >> dwysakow...@apache.org
<mailto:dwysakow...@apache.org> <mailto:dwysakow...@apache.org
<mailto:dwysakow...@apache.org>>>
>>>> >>> wrote:
>>>> >>>
>>>> >>>> Sorry to jump in late, but I think Bowen missed the
most
>>>> important point
>>>> >>>> from Chesnay's previous message in the summary. The
ultimate
>>>> reason for
>>>> >>>> all the problems is that the tests take close to 2
hours to
>>>> run already.
>>>> >>>> I fully support this claim: "Unless people start
caring about
>>>> test times
>>>> >>>> before adding them, this issue cannot be solved"
>>>> >>>>
>>>> >>>> This is also another reason why using user's Travis
account
>>>> won't help.
>>>> >>>> Every few weeks we reach the user's time limit for
a single
>>>> profile.
>>>> >>>> This makes the user's builds simply fail, until we
either
>>>> properly
>>>> >>>> decrease the time the tests take (which I am not
sure we ever
>>>> did) or
>>>> >>>> postpone the problem by splitting into more
profiles. (Note
>>>> that the ASF
>>>> >>>> Travis account has higher time limits)
>>>> >>>>
>>>> >>>> Best,
>>>> >>>>
>>>> >>>> Dawid
>>>> >>>>
>>>> >>>> On 26/06/2019 09:36, Robert Metzger wrote:
>>>> >>>>> Do we know if using "the best" available hardware
would
>>>> improve the
>>>> >> build
>>>> >>>>> times?
>>>> >>>>> Imagine we would run the build on machines with
plenty of
>>>> main memory
>>>> >> to
>>>> >>>>> mount everything to ramdisk + the latest CPU
architecture?
>>>> >>>>>
>>>> >>>>> Throwing hardware at the problem could help reduce
the time
>>>> of an
>>>> >>>>> individual build, and using our own infrastructure
would
>>>> remove our
>>>> >>>>> dependency on Apache's Travis account (with the
obvious
>>>> downside of
>>>> >>>> having
>>>> >>>>> to maintain the infrastructure)
>>>> >>>>> We could use an open source travis alternative, to
have a
>>>> similar
>>>> >>>>> experience and make the migration easy.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler
>>>> <ches...@apache.org <mailto:ches...@apache.org>
<mailto:ches...@apache.org <mailto:ches...@apache.org>>>
>>>> >>>> wrote:
>>>> >>>>>> >From what I gathered, there's no special
sauce that the
>>>> Zeppelin
>>>> >>>>>> project uses which actually integrates a users
Travis
>>>> account into the
>>>> >>>> PR.
>>>> >>>>>> They just disabled Travis for PRs. And that's
kind of it.
>>>> >>>>>>
>>>> >>>>>> Naturally we can do this (duh) and safe the ASF a
fair
>>>> amount of
>>>> >>>>>> resources, but there are downsides:
>>>> >>>>>>
>>>> >>>>>> The discoverability of the Travis check takes a
nose-dive.
>>>> Either we
>>>> >>>>>> require every contributor to always, an every
commit, also
>>>> post a
>>>> >> Travis
>>>> >>>>>> build, or we have the reviewer sift through the
>>>> contributors account
>>>> >> to
>>>> >>>>>> find it.
>>>> >>>>>>
>>>> >>>>>> This is rather cumbersome. Additionally, it's
also not
>>>> equivalent to
>>>> >>>>>> having a PR build.
>>>> >>>>>>
>>>> >>>>>> A normal branch build takes a branch as is and
tests it. A
>>>> PR build
>>>> >>>>>> merges the branch into master, and then runs it.
(Fun fact:
>>>> This is
>>>> >> why
>>>> >>>>>> a PR without merge conflicts is not being run on
Travis.)
>>>> >>>>>>
>>>> >>>>>> And ultimately, everyone can already make use of
this
>>>> approach anyway.
>>>> >>>>>>
>>>> >>>>>> On 25/06/2019 08:02, Jark Wu wrote:
>>>> >>>>>>> Hi Jeff,
>>>> >>>>>>>
>>>> >>>>>>> Thanks for sharing the Zeppelin approach. I
think it's a
>>>> good idea to
>>>> >>>>>>> leverage user's travis account.
>>>> >>>>>>> In this way, we can have almost unlimited
concurrent build
>>>> jobs and
>>>> >>>>>>> developers can restart build by themselves
(currently only
>>>> committers
>>>> >>>>>>> can restart PR's build).
>>>> >>>>>>>
>>>> >>>>>>> But I'm still not very clear how to integrate
user's
>>>> travis build
>>>> >> into
>>>> >>>>>>> the Flink pull request's build automatically.
Can you
>>>> explain more in
>>>> >>>>>>> detail?
>>>> >>>>>>>
>>>> >>>>>>> Another question: does travis only build
branches for user
>>>> account?
>>>> >>>>>>> My concern is that builds for PRs will rebase
user's
>>>> commits against
>>>> >>>>>>> current master branch.
>>>> >>>>>>> This will help us to find problems before
merge. Builds
>>>> for branches
>>>> >>>>>>> will lose the impact of new commits in master.
>>>> >>>>>>> How does Zeppelin solve this problem?
>>>> >>>>>>>
>>>> >>>>>>> Thanks again for sharing the idea.
>>>> >>>>>>>
>>>> >>>>>>> Regards,
>>>> >>>>>>> Jark
>>>> >>>>>>>
>>>> >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang
<zjf...@gmail.com <mailto:zjf...@gmail.com>
>>>> <mailto:zjf...@gmail.com <mailto:zjf...@gmail.com>>
>>>> >>>>>>> <mailto:zjf...@gmail.com
<mailto:zjf...@gmail.com> <mailto:zjf...@gmail.com
<mailto:zjf...@gmail.com>>>> wrote:
>>>> >>>>>>>
>>>> >>>>>>> Hi Folks,
>>>> >>>>>>>
>>>> >>>>>>> Zeppelin meet this kind of issue before, we solve
>>>> it by
>>>> >> delegating
>>>> >>>>>>> each
>>>> >>>>>>> one's PR build to his travis account
(Everyone can
>>>> have 5 free
>>>> >>>>>>> slot for
>>>> >>>>>>> travis build).
>>>> >>>>>>> Apache account travis build is only triggered
when
>>>> PR is merged.
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> Kurt Young <ykt...@gmail.com
<mailto:ykt...@gmail.com>
>>>> <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com>>
<mailto:ykt...@gmail.com <mailto:ykt...@gmail.com>
>>>> <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com>>>>
>>>> >>>>>>> 于2019年6月25日周二 上午10:16写道:
>>>> >>>>>>>
>>>> >>>>>>> > (Forgot to cc George)
>>>> >>>>>>> >
>>>> >>>>>>> > Best,
>>>> >>>>>>> > Kurt
>>>> >>>>>>> >
>>>> >>>>>>> >
>>>> >>>>>>> > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young
>>>> <ykt...@gmail.com <mailto:ykt...@gmail.com>
<mailto:ykt...@gmail.com <mailto:ykt...@gmail.com>>
>>>> >>>>>>> <mailto:ykt...@gmail.com
<mailto:ykt...@gmail.com> <mailto:ykt...@gmail.com
<mailto:ykt...@gmail.com>>>>
>>>> wrote:
>>>> >>>>>>> >
>>>> >>>>>>> > > Hi Bowen,
>>>> >>>>>>> > >
>>>> >>>>>>> > > Thanks for bringing this up. We
actually have
>>>> discussed
>>>> >> about
>>>> >>>>>>> this, and I
>>>> >>>>>>> > > think Till and George have
>>>> >>>>>>> > > already spend sometime investigating
it. I have
>>>> cced both of
>>>> >>>>>>> them, and
>>>> >>>>>>> > > maybe they can share
>>>> >>>>>>> > > their findings.
>>>> >>>>>>> > >
>>>> >>>>>>> > > Best,
>>>> >>>>>>> > > Kurt
>>>> >>>>>>> > >
>>>> >>>>>>> > >
>>>> >>>>>>> > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu
>>>> <imj...@gmail.com <mailto:imj...@gmail.com>
<mailto:imj...@gmail.com <mailto:imj...@gmail.com>>
>>>> >>>>>>> <mailto:imj...@gmail.com
<mailto:imj...@gmail.com> <mailto:imj...@gmail.com
<mailto:imj...@gmail.com>>>>
>>>> wrote:
>>>> >>>>>>> > >
>>>> >>>>>>> > >> Hi Bowen,
>>>> >>>>>>> > >>
>>>> >>>>>>> > >> Thanks for bringing this. We also
suffered from
>>>> the long
>>>> >>>>>>> build time.
>>>> >>>>>>> > >> I agree that we should focus on
solving build
>>>> capacity
>>>> >>>>>>> problem in the
>>>> >>>>>>> > >> thread.
>>>> >>>>>>> > >>
>>>> >>>>>>> > >> My observation is there is only one
build is
>>>> running, all
>>>> >> the
>>>> >>>>>>> others
>>>> >>>>>>> > >> (other
>>>> >>>>>>> > >> PRs, master) are pending.
>>>> >>>>>>> > >> The pricing plan[1] of travis shows
it can
>>>> support
>>>> >> concurrent
>>>> >>>>>>> build
>>>> >>>>>>> > jobs.
>>>> >>>>>>> > >> But I don't know which plan we are
using, might
>>>> be the free
>>>> >>>>>>> plan for
>>>> >>>>>>> > open
>>>> >>>>>>> > >> source.
>>>> >>>>>>> > >>
>>>> >>>>>>> > >> I cc-ed Chesnay who may have some
experience on
>>>> Travis.
>>>> >>>>>>> > >>
>>>> >>>>>>> > >> Regards,
>>>> >>>>>>> > >> Jark
>>>> >>>>>>> > >>
>>>> >>>>>>> > >> [1]: https://travis-ci.com/plans
>>>> >>>>>>> > >>
>>>> >>>>>>> > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
>>>> >> bowenl...@gmail.com <mailto:bowenl...@gmail.com>
<mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>>
>>>> >>>>>>> <mailto:bowenl...@gmail.com
<mailto:bowenl...@gmail.com>
>>>> <mailto:bowenl...@gmail.com
<mailto:bowenl...@gmail.com>>>> wrote:
>>>> >>>>>>> > >>
>>>> >>>>>>> > >> > Hi Steven,
>>>> >>>>>>> > >> >
>>>> >>>>>>> > >> > I think you may not read what I
wrote. The
>>>> discussion is
>>>> >>>> about
>>>> >>>>>>> > "unstable
>>>> >>>>>>> > >> > build **capacity**", in another word
>>>> "unstable / lack of
>>>> >>>> build
>>>> >>>>>>> > >> resources",
>>>> >>>>>>> > >> > not "unstable build".
>>>> >>>>>>> > >> >
>>>> >>>>>>> > >> > On Mon, Jun 24, 2019 at 4:40 PM
Steven Wu
>>>> >>>>>>> <stevenz...@gmail.com
<mailto:stevenz...@gmail.com> <mailto:stevenz...@gmail.com
<mailto:stevenz...@gmail.com>>
>>>> <mailto:stevenz...@gmail.com
<mailto:stevenz...@gmail.com> <mailto:stevenz...@gmail.com
<mailto:stevenz...@gmail.com>>>>
>>>> >>>>>>> > wrote:
>>>> >>>>>>> > >> >
>>>> >>>>>>> > >> > > long and sometimes unstable build is
>>>> definitely a pain
>>>> >>>>>> point.
>>>> >>>>>>> > >> > >
>>>> >>>>>>> > >> > > I suspect the build failure here in
>>>> >> flink-connector-kafka
>>>> >>>>>>> is not
>>>> >>>>>>> > >> related
>>>> >>>>>>> > >> > to
>>>> >>>>>>> > >> > > my change. but there is no easy
re-run the
>>>> build on
>>>> >>>>>>> travis UI.
>>>> >>>>>>> > Google
>>>> >>>>>>> > >> > > search showed a trick of
close-and-open the
>>>> PR will
>>>> >>>>>>> trigger rebuild.
>>>> >>>>>>> > >> but
>>>> >>>>>>> > >> > > that could add noises to the PR
activities.
>>>> >>>>>>> > >> > >
>>>> https://travis-ci.org/apache/flink/jobs/545555519
>>>> >>>>>>> > >> > >
>>>> >>>>>>> > >> > > travis-ci for my personal repo
often failed
>>>> with
>>>> >>>>>>> exceeding time
>>>> >>>>>>> > limit
>>>> >>>>>>> > >> > after
>>>> >>>>>>> > >> > > 4+ hours.
>>>> >>>>>>> > >> > > The job exceeded the maximum time
limit for
>>>> jobs, and
>>>> >> has
>>>> >>>>>>> been
>>>> >>>>>>> > >> > terminated.
>>>> >>>>>>> > >> > >
>>>> >>>>>>> > >> > > On Mon, Jun 24, 2019 at 4:15 PM
Bowen Li
>>>> >>>>>>> <bowenl...@gmail.com
<mailto:bowenl...@gmail.com> <mailto:bowenl...@gmail.com
<mailto:bowenl...@gmail.com>>
>>>> <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>
<mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>>>>
>>>> >>>>>>> > wrote:
>>>> >>>>>>> > >> > >
>>>> >>>>>>> > >> > > >
>>>> https://travis-ci.org/apache/flink/builds/549681530
>>>> >>>>>>> This build
>>>> >>>>>>> > >> > request
>>>> >>>>>>> > >> > > > has
>>>> >>>>>>> > >> > > > been sitting at **HEAD of the
queue**
>>>> since I first
>>>> >> saw
>>>> >>>>>>> it at PST
>>>> >>>>>>> > >> > 10:30am
>>>> >>>>>>> > >> > > > (not sure how long it's been
there before
>>>> 10:30am).
>>>> >>>>>>> It's PST
>>>> >>>>>>> > 4:12pm
>>>> >>>>>>> > >> now
>>>> >>>>>>> > >> > > and
>>>> >>>>>>> > >> > > > it hasn't started yet.
>>>> >>>>>>> > >> > > >
>>>> >>>>>>> > >> > > > On Mon, Jun 24, 2019 at 2:48 PM
Bowen Li
>>>> >>>>>>> <bowenl...@gmail.com
<mailto:bowenl...@gmail.com> <mailto:bowenl...@gmail.com
<mailto:bowenl...@gmail.com>>
>>>> <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>
<mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>>>>
>>>> >>>>>>> > >> wrote:
>>>> >>>>>>> > >> > > >
>>>> >>>>>>> > >> > > > > Hi devs,
>>>> >>>>>>> > >> > > > >
>>>> >>>>>>> > >> > > > > I've been experiencing the pain
>>>> resulting from lack
>>>> >>>>>>> of stable
>>>> >>>>>>> > >> build
>>>> >>>>>>> > >> > > > > capacity on Travis for Flink
PRs [1].
>>>> >> Specifically, I
>>>> >>>>>>> noticed
>>>> >>>>>>> > >> often
>>>> >>>>>>> > >> > > that
>>>> >>>>>>> > >> > > > no
>>>> >>>>>>> > >> > > > > build in the queue is making any
>>>> progress for
>>>> >> hours,
>>>> >>>> and
>>>> >>>>>>> > suddenly
>>>> >>>>>>> > >> 5
>>>> >>>>>>> > >> > or
>>>> >>>>>>> > >> > > 6
>>>> >>>>>>> > >> > > > > builds kick off all together
after the
>>>> long pause.
>>>> >>>>>>> I'm at PST
>>>> >>>>>>> > >> > (UTC-08)
>>>> >>>>>>> > >> > > > time
>>>> >>>>>>> > >> > > > > zone, and I've seen pause can
be as
>>>> long as 6 hours
>>>> >>>>>>> from PST 9am
>>>> >>>>>>> > >> to
>>>> >>>>>>> > >> > 3pm
>>>> >>>>>>> > >> > > > > (let alone the time needed to
drain the
>>>> queue
>>>> >>>>>>> afterwards).
>>>> >>>>>>> > >> > > > >
>>>> >>>>>>> > >> > > > > I think this has greatly
impacted our
>>>> productivity.
>>>> >>>> I've
>>>> >>>>>>> > >> experienced
>>>> >>>>>>> > >> > > that
>>>> >>>>>>> > >> > > > > PRs submitted in the early
morning of
>>>> PST time zone
>>>> >>>>>>> won't finish
>>>> >>>>>>> > >> > their
>>>> >>>>>>> > >> > > > > build until late night of the
same day.
>>>> >>>>>>> > >> > > > >
>>>> >>>>>>> > >> > > > > So my questions are:
>>>> >>>>>>> > >> > > > >
>>>> >>>>>>> > >> > > > > - Has anyone else experienced
the same
>>>> problem or
>>>> >>>>>>> have similar
>>>> >>>>>>> > >> > > > observation
>>>> >>>>>>> > >> > > > > on TravisCI? (I suspect it
has things
>>>> to do with
>>>> >> time
>>>> >>>>>>> zone)
>>>> >>>>>>> > >> > > > >
>>>> >>>>>>> > >> > > > > - What pricing plan of
TravisCI is
>>>> Flink currently
>>>> >>>>>>> using? Is it
>>>> >>>>>>> > >> the
>>>> >>>>>>> > >> > > free
>>>> >>>>>>> > >> > > > > plan for open source
projects? What
>>>> are the
>>>> >>>>>>> guaranteed build
>>>> >>>>>>> > >> capacity
>>>> >>>>>>> > >> > > of
>>>> >>>>>>> > >> > > > > the current plan?
>>>> >>>>>>> > >> > > > >
>>>> >>>>>>> > >> > > > > - If the current pricing plan
(either
>>>> free or paid)
>>>> >>>>>> can't
>>>> >>>>>>> > provide
>>>> >>>>>>> > >> > > stable
>>>> >>>>>>> > >> > > > > build capacity, can we
upgrade to a
>>>> higher priced
>>>> >>>>>>> plan with
>>>> >>>>>>> > larger
>>>> >>>>>>> > >> > and
>>>> >>>>>>> > >> > > > more
>>>> >>>>>>> > >> > > > > stable build capacity?
>>>> >>>>>>> > >> > > > >
>>>> >>>>>>> > >> > > > > BTW, another factor that
contribute to
>>>> the
>>>> >>>>>>> productivity problem
>>>> >>>>>>> > is
>>>> >>>>>>> > >> > that
>>>> >>>>>>> > >> > > > > our build is slow - we run
full build
>>>> for every PR
>>>> >>>> and a
>>>> >>>>>>> > >> successful
>>>> >>>>>>> > >> > > full
>>>> >>>>>>> > >> > > > > build takes ~5h. We
definitely have
>>>> more options to
>>>> >>>>>>> solve it,
>>>> >>>>>>> > for
>>>> >>>>>>> > >> > > > instance,
>>>> >>>>>>> > >> > > > > modularize the build graphs
and reuse
>>>> artifacts
>>>> >> from
>>>> >>>> the
>>>> >>>>>>> > previous
>>>> >>>>>>> > >> > > build.
>>>> >>>>>>> > >> > > > > But I think that can be a big
effort
>>>> which is much
>>>> >>>>>>> harder to
>>>> >>>>>>> > >> > accomplish
>>>> >>>>>>> > >> > > > in
>>>> >>>>>>> > >> > > > > a short period of time and
may deserve
>>>> its own
>>>> >>>> separate
>>>> >>>>>>> > >> discussion.
>>>> >>>>>>> > >> > > > >
>>>> >>>>>>> > >> > > > > [1]
>>>> >> https://travis-ci.org/apache/flink/pull_requests
>>>> >>>>>>> > >> > > > >
>>>> >>>>>>> > >> > > > >
>>>> >>>>>>> > >> > > >
>>>> >>>>>>> > >> > >
>>>> >>>>>>> > >> >
>>>> >>>>>>> > >>
>>>> >>>>>>> > >
>>>> >>>>>>> >
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> --
>>>> >>>>>>> Best Regards
>>>> >>>>>>>
>>>> >>>>>>> Jeff Zhang
>>>> >>>>>>>
>>>> >>
>>>>
>>>
>>