Adding CI builds for ARM makes only sense when we actually take them into
account as "blocking a merge", otherwise there is no point in having them.
So we would need to be prepared to do that.

The cases where something runs in UNIX/x64 but fails on ARM are few cases
and so far seem to have been related to libraries or some magic that tries
to do system dependent actions outside Java.

One worthwhile discussion could be whether to run the ARM CI builds as part
of the nightly tests, not on every commit.
There are a lot of nightly tests, for example for different Java / Scala /
Hadoop versions.

On Mon, Aug 26, 2019 at 10:46 AM Xiyuan Wang <wangxiyuan1...@gmail.com>
wrote:

> Sorry, maybe my words is misleading.
>
> We are just starting adding ARM support. So the CI is non-voting at this
> moment to avoid blocking normal Flink development.
>
> But once the ARM CI works well and stable enough. We should mark it as
> voting. It means that in the future, if the ARM test is failed in a PR, the
> PR can not be merged. The test log may tell develpers what error is
> comming. If the develper need debug the detail on an ARM vm, OpenLab can
> provider it.
>
> Adding ARM CI can make sure Flink support ARM originally
>
> I left a workflow in the PR, I'd like to print it here:
>
>    1. Add the basic build script to ensure the CI system and build job
>    works as expect. The job should be marked as non-voting first, it means the
>    CI test failure won't block Flink PR to be merged.
>    2. Add the test script to run unit/intergration test. At this step the
>    --fn parameter will be added to mvn test. It will run the full test cases
>    in Flink, so that we can find what test is failed on ARM.
>    3. Fix the test failure one by one.
>    4. Once all the tests are passed, remove the --fn parameter and keep
>    watch the CI's status for some days. If some bugs raise then, fix them as
>    what we usually do for travis-ci.
>    5. Once the CI is stable enought, remove the non-voting tag, so that
>    the ARM CI will be the same as travis-ci, to be one of the gate for Flink
>    PR.
>    6. Finally, Flink community can announce and release Flink ARM version.
>
>
> Chesnay Schepler <ches...@apache.org> 于2019年8月26日周一 下午2:25写道:
>
>> I'm sorry, but if these issues are only fixed later anyway I see no
>> reason to run these tests on each PR. We're just adding noise to each PR
>> that everyone will just ignore.
>>
>> I'm curious as to the benefit of having this directly in Flink; why
>> aren't the ARM builds run outside of the Flink project, and fixes for it
>> provided?
>>
>> It seems to me like nothing about these arm builds is actually handled
>> by the Flink project.
>>
>> On 26/08/2019 03:43, Xiyuan Wang wrote:
>> > Thanks for Stephan to bring up this topic.
>> >
>> > The package build jobs work well now. I have a simple online demo which
>> is
>> > built and ran on a ARM VM. Feel free to have a try[1].
>> >
>> > As the first step for ARM support, maybe it's good to add them now.
>> >
>> > While for the next step, the test part is still broken. It relates to
>> some
>> > points we find:
>> >
>> > 1. Some unit tests are failed[1] by Java coding. These kind of failure
>> can
>> > be fixed easily.
>> > 2. Some tests are failed by depending on third part libaraies[2]. It
>> > includes frocksdb, MapR Client and Netty. They don't have ARM release.
>> >      a. Frocksdb: I'm testing it locally now by `make check_some` and
>> `make
>> > jtest` similar with its travis job. There are 3 tests failed by `make
>> > check_some`. Please see the ticket for more details. Once the test pass,
>> > frocksdb can release ARM package then.
>> >      b. MapR Client. This belongs to MapR company. At this moment,
>> maybe we
>> > should skip MapR support for Flink ARM.
>> >      c. Netty. Actually Netty runs well on our ARM machine. We will ask
>> > Netty community to release ARM support. If they do not want, OpenLab
>> will
>> > handle a Maven Repository for some common libraries on ARM.
>> >
>> >
>> > For Chesnay's concern:
>> >
>> > Firstly, OpenLab team will keep maintaining and fixing ARM CI. It means
>> > that once build or test fails, we'll fix it at once.
>> > Secondly,  OpenLab can provide ARM VMs to everyone for reproducing and
>> > testing. You just need to creat a  Test Request issue in openlab[1].
>> Then
>> > we'll create ARM VMs for you, you can  login and do the thing you want.
>> >
>> > Does it make sense?
>> >
>> > [1]: http://114.115.168.52:8081/#/overview
>> > [1]: https://issues.apache.org/jira/browse/FLINK-13449
>> >        https://issues.apache.org/jira/browse/FLINK-13450
>> > [2]: https://issues.apache.org/jira/browse/FLINK-13598
>> > [3]: https://github.com/theopenlab/openlab/issues/new/choose
>> >
>> >
>> >
>> >
>> > Chesnay Schepler <ches...@apache.org> 于2019年8月24日周六 上午12:10写道:
>> >
>> >> I'm wondering what we are supposed to do if the build fails?
>> >> We aren't providing and guides on setting up an arm dev environment; so
>> >> reproducing it locally isn't possible.
>> >>
>> >> On 23/08/2019 17:55, Stephan Ewen wrote:
>> >>> Hi all!
>> >>>
>> >>> As part of the Flink on ARM effort, there is a pull request that
>> >> triggers a
>> >>> build on OpenLabs CI for each push and runs tests on ARM machines.
>> >>>
>> >>> Currently that build is roughly equivalent to what the "core" and
>> "tests"
>> >>> profiles do on Travis.
>> >>> The result will be posted to the PR comments, similar to the Flink
>> Bot's
>> >>> Travis build result.
>> >>> The build currently passes :-) so Flink seems to be okay on ARM.
>> >>>
>> >>> My suggestion would be to try and add this and gather some experience
>> >> with
>> >>> it.
>> >>> The Travis build results should be our "ground truth" and the ARM CI
>> >>> (openlabs CI) would be "informational only" at the beginning, but
>> helping
>> >>> us understand when we break ARM support.
>> >>>
>> >>> You can see this in the PR that adds the openlabs CI config:
>> >>> https://github.com/apache/flink/pull/9416
>> >>>
>> >>> Any objections?
>> >>>
>> >>> Best,
>> >>> Stephan
>> >>>
>> >>
>>
>>

Reply via email to