Re: [DISCUSS] Add ARM CI build to Flink (information-only)

Xiyuan Wang Mon, 02 Sep 2019 19:41:09 -0700

The ARM CI trigger has been changed to `github comment` way only. It means
that every PR won't start ARM test unless a comment `check_arm` is added.
Like what I did in the PR[1].


A POC for Flink nightly end to end test job is created as well[2]. I'll
improve it then.

Any feedback or question?


[1]: https://github.com/apache/flink/pull/9416
     https://github.com/apache/flink/pull/9416#issuecomment-527268203
[2]: https://github.com/theopenlab/openlab-zuul-jobs/pull/631


Thanks

Xiyuan Wang <wangxiyuan1...@gmail.com> 于2019年8月26日周一 下午7:41写道：

> Before ARM CI is ready, I can close the CI test for each PR and let it
> only be triggered by PR comment.  It's quite easy for OpenLab to do this.
>
> OpenLab have many job piplines[1].  Now I use `check` pipline in
> https://github.com/apache/flink/pull/9416. The job trigger contains
> github_action and github_comment[2]. I can create a new pipline for Flink,
> the new trigger can only contain github_coment like:
>
> trigger:
>   github:
>  - event: pull_request
>    action: comment
>    comment: (?i)^\s*recheck_arm_build\s*$
>
> So that the ARM job will not be ran for every PR. It'll be just ran for
> the PR which have `recheck_arm_build` comment.
>
> Then once ARM CI is ready, I can add it back.
>
>
> nightly tests can be added as well of couse. There is a kind of job in
> OpenLab called `periodic job`. We can use it for Flink daily nightly tests.
> If any error occur, the report can be sent to bui...@flink.apache.org  as
> well.
>
> [1]:
> https://github.com/theopenlab/openlab-zuul-jobs/blob/master/zuul.d/pipelines.yaml
> [2]:
> https://github.com/theopenlab/openlab-zuul-jobs/blob/master/zuul.d/pipelines.yaml#L10-L19
>
> Stephan Ewen <se...@apache.org> 于2019年8月26日周一 下午6:13写道：
>
>> Adding CI builds for ARM makes only sense when we actually take them into
>> account as "blocking a merge", otherwise there is no point in having them.
>> So we would need to be prepared to do that.
>>
>> The cases where something runs in UNIX/x64 but fails on ARM are few cases
>> and so far seem to have been related to libraries or some magic that tries
>> to do system dependent actions outside Java.
>>
>> One worthwhile discussion could be whether to run the ARM CI builds as
>> part
>> of the nightly tests, not on every commit.
>> There are a lot of nightly tests, for example for different Java / Scala /
>> Hadoop versions.
>>
>> On Mon, Aug 26, 2019 at 10:46 AM Xiyuan Wang <wangxiyuan1...@gmail.com>
>> wrote:
>>
>> > Sorry, maybe my words is misleading.
>> >
>> > We are just starting adding ARM support. So the CI is non-voting at this
>> > moment to avoid blocking normal Flink development.
>> >
>> > But once the ARM CI works well and stable enough. We should mark it as
>> > voting. It means that in the future, if the ARM test is failed in a PR,
>> the
>> > PR can not be merged. The test log may tell develpers what error is
>> > comming. If the develper need debug the detail on an ARM vm, OpenLab can
>> > provider it.
>> >
>> > Adding ARM CI can make sure Flink support ARM originally
>> >
>> > I left a workflow in the PR, I'd like to print it here:
>> >
>> >    1. Add the basic build script to ensure the CI system and build job
>> >    works as expect. The job should be marked as non-voting first, it
>> means the
>> >    CI test failure won't block Flink PR to be merged.
>> >    2. Add the test script to run unit/intergration test. At this step
>> the
>> >    --fn parameter will be added to mvn test. It will run the full test
>> cases
>> >    in Flink, so that we can find what test is failed on ARM.
>> >    3. Fix the test failure one by one.
>> >    4. Once all the tests are passed, remove the --fn parameter and keep
>> >    watch the CI's status for some days. If some bugs raise then, fix
>> them as
>> >    what we usually do for travis-ci.
>> >    5. Once the CI is stable enought, remove the non-voting tag, so that
>> >    the ARM CI will be the same as travis-ci, to be one of the gate for
>> Flink
>> >    PR.
>> >    6. Finally, Flink community can announce and release Flink ARM
>> version.
>> >
>> >
>> > Chesnay Schepler <ches...@apache.org> 于2019年8月26日周一 下午2:25写道：
>> >
>> >> I'm sorry, but if these issues are only fixed later anyway I see no
>> >> reason to run these tests on each PR. We're just adding noise to each
>> PR
>> >> that everyone will just ignore.
>> >>
>> >> I'm curious as to the benefit of having this directly in Flink; why
>> >> aren't the ARM builds run outside of the Flink project, and fixes for
>> it
>> >> provided?
>> >>
>> >> It seems to me like nothing about these arm builds is actually handled
>> >> by the Flink project.
>> >>
>> >> On 26/08/2019 03:43, Xiyuan Wang wrote:
>> >> > Thanks for Stephan to bring up this topic.
>> >> >
>> >> > The package build jobs work well now. I have a simple online demo
>> which
>> >> is
>> >> > built and ran on a ARM VM. Feel free to have a try[1].
>> >> >
>> >> > As the first step for ARM support, maybe it's good to add them now.
>> >> >
>> >> > While for the next step, the test part is still broken. It relates to
>> >> some
>> >> > points we find:
>> >> >
>> >> > 1. Some unit tests are failed[1] by Java coding. These kind of
>> failure
>> >> can
>> >> > be fixed easily.
>> >> > 2. Some tests are failed by depending on third part libaraies[2]. It
>> >> > includes frocksdb, MapR Client and Netty. They don't have ARM
>> release.
>> >> >      a. Frocksdb: I'm testing it locally now by `make check_some` and
>> >> `make
>> >> > jtest` similar with its travis job. There are 3 tests failed by `make
>> >> > check_some`. Please see the ticket for more details. Once the test
>> pass,
>> >> > frocksdb can release ARM package then.
>> >> >      b. MapR Client. This belongs to MapR company. At this moment,
>> >> maybe we
>> >> > should skip MapR support for Flink ARM.
>> >> >      c. Netty. Actually Netty runs well on our ARM machine. We will
>> ask
>> >> > Netty community to release ARM support. If they do not want, OpenLab
>> >> will
>> >> > handle a Maven Repository for some common libraries on ARM.
>> >> >
>> >> >
>> >> > For Chesnay's concern:
>> >> >
>> >> > Firstly, OpenLab team will keep maintaining and fixing ARM CI. It
>> means
>> >> > that once build or test fails, we'll fix it at once.
>> >> > Secondly,  OpenLab can provide ARM VMs to everyone for reproducing
>> and
>> >> > testing. You just need to creat a  Test Request issue in openlab[1].
>> >> Then
>> >> > we'll create ARM VMs for you, you can  login and do the thing you
>> want.
>> >> >
>> >> > Does it make sense?
>> >> >
>> >> > [1]: http://114.115.168.52:8081/#/overview
>> >> > [1]: https://issues.apache.org/jira/browse/FLINK-13449
>> >> >        https://issues.apache.org/jira/browse/FLINK-13450
>> >> > [2]: https://issues.apache.org/jira/browse/FLINK-13598
>> >> > [3]: https://github.com/theopenlab/openlab/issues/new/choose
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > Chesnay Schepler <ches...@apache.org> 于2019年8月24日周六 上午12:10写道：
>> >> >
>> >> >> I'm wondering what we are supposed to do if the build fails?
>> >> >> We aren't providing and guides on setting up an arm dev
>> environment; so
>> >> >> reproducing it locally isn't possible.
>> >> >>
>> >> >> On 23/08/2019 17:55, Stephan Ewen wrote:
>> >> >>> Hi all!
>> >> >>>
>> >> >>> As part of the Flink on ARM effort, there is a pull request that
>> >> >> triggers a
>> >> >>> build on OpenLabs CI for each push and runs tests on ARM machines.
>> >> >>>
>> >> >>> Currently that build is roughly equivalent to what the "core" and
>> >> "tests"
>> >> >>> profiles do on Travis.
>> >> >>> The result will be posted to the PR comments, similar to the Flink
>> >> Bot's
>> >> >>> Travis build result.
>> >> >>> The build currently passes :-) so Flink seems to be okay on ARM.
>> >> >>>
>> >> >>> My suggestion would be to try and add this and gather some
>> experience
>> >> >> with
>> >> >>> it.
>> >> >>> The Travis build results should be our "ground truth" and the ARM
>> CI
>> >> >>> (openlabs CI) would be "informational only" at the beginning, but
>> >> helping
>> >> >>> us understand when we break ARM support.
>> >> >>>
>> >> >>> You can see this in the PR that adds the openlabs CI config:
>> >> >>> https://github.com/apache/flink/pull/9416
>> >> >>>
>> >> >>> Any objections?
>> >> >>>
>> >> >>> Best,
>> >> >>> Stephan
>> >> >>>
>> >> >>
>> >>
>> >>
>>
>

Re: [DISCUSS] Add ARM CI build to Flink (information-only)

Reply via email to