Re: Ask for ARM CI for spark

2019-07-17 Thread Tianhua huang
Hi all,

We run all unit tests for spark on arm64 platform, after effort there are
four tests FAILED, see
https://logs.openlabtesting.org/logs/4/4/ae5ebaddd6ba6eba5a525b2bf757043ebbe78432/check/spark-build-arm64/9ecccad/job-output.txt.gz

Two failed and the reason is 'Can't find 1 executors before 1
milliseconds elapsed', see below, then we try increase timeout the tests
passed, so wonder if we can increase the timeout? and here I have another
question about
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TestUtils.scala#L285,
why is not >=? see the comment of the function, it should be >=?

- test driver discovery under local-cluster mode *** FAILED ***
  java.util.concurrent.TimeoutException: Can't find 1 executors before
1 milliseconds elapsed
  at org.apache.spark.TestUtils$.waitUntilExecutorsUp(TestUtils.scala:293)
  at 
org.apache.spark.SparkContextSuite.$anonfun$new$78(SparkContextSuite.scala:753)
  at 
org.apache.spark.SparkContextSuite.$anonfun$new$78$adapted(SparkContextSuite.scala:741)
  at org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:161)
  at 
org.apache.spark.SparkContextSuite.$anonfun$new$77(SparkContextSuite.scala:741)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
  at org.scalatest.Transformer.apply(Transformer.scala:22)

- test gpu driver resource files and discovery under local-cluster
mode *** FAILED ***
  java.util.concurrent.TimeoutException: Can't find 1 executors before
1 milliseconds elapsed
  at org.apache.spark.TestUtils$.waitUntilExecutorsUp(TestUtils.scala:293)
  at 
org.apache.spark.SparkContextSuite.$anonfun$new$80(SparkContextSuite.scala:781)
  at 
org.apache.spark.SparkContextSuite.$anonfun$new$80$adapted(SparkContextSuite.scala:761)
  at org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:161)
  at 
org.apache.spark.SparkContextSuite.$anonfun$new$79(SparkContextSuite.scala:761)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
  at org.scalatest.Transformer.apply(Transformer.scala:22)

The other two failed and the reason is '2143289344 equaled
2143289344', this because the value of floatToRawIntBits(0.0f/0.0f) on
aarch64 platform is 2143289344 and equals to
floatToRawIntBits(Float.NaN). About this I send email to jdk-dev and
proposed a topic on scala community
https://users.scala-lang.org/t/the-value-of-floattorawintbits-0-0f-0-0f-is-different-on-x86-64-and-aarch64-platforms/4845
and https://github.com/scala/bug/issues/11632, I thought it's
something about jdk or scala, but after discuss, it should related
with platform, so seems the following asserts is not appropriate?
https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala#L704-L705
and 
https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala#L732-L733

 - SPARK-26021: NaN and -0.0 in grouping expressions *** FAILED ***
   2143289344 equaled 2143289344 (DataFrameAggregateSuite.scala:732)
 - NaN and -0.0 in window partition keys *** FAILED ***
   2143289344 equaled 2143289344 (DataFrameWindowFunctionsSuite.scala:704)

About the failed tests fixing, we are waiting for your suggestions,
thank you very much.


On Wed, Jul 10, 2019 at 10:07 AM Tianhua huang 
wrote:

> Hi all,
>
> I am glad to tell you there is a new progress of build/test spark on
> aarch64 server, the tests are running, see the build/test detail log
> https://logs.openlabtesting.org/logs/1/1/419fcb11764048d5a3cda186ea76dd43249e1f97/check/spark-build-arm64/75cc6f5/job-output.txt.gz
>  and
> the aarch64 instance info see
> https://logs.openlabtesting.org/logs/1/1/419fcb11764048d5a3cda186ea76dd43249e1f97/check/spark-build-arm64/75cc6f5/zuul-info/zuul-info.ubuntu-xenial-arm64.txt
>  In
> order to enable the test, I made some modification, the major one is to
> build leveldbjni local package, I forked fusesource/leveldbjni and
> chirino/leveldb repos, and made some modification to make sure to build the
> local package, see https://github.com/huangtianhua/leveldbjni/pull/1 and
> https://github.com/huangtianhua/leveldbjni/pull/2 , then to use it in
> spark, the detail you can find in
> https://github.com/theopenlab/spark/pull/1
>
> Now the tests are not all successful, I will try to fix it and any
> suggestion is welcome, thank you all.
>
> On Mon, Jul 1, 2019 at 5:25 PM Tianhua huang 
> wrote:
>
>> We are focus on the arm instance of cloud, and now I use the arm instance
>> of vexxhost cloud to run the build job which mentioned above, the
>> specificat

Re: IPv6 support

2019-07-17 Thread Steve Loughran
Fairly neglected hadoop patch, FWIW;
https://issues.apache.org/jira/browse/HADOOP-11890

FB have been running HDFS &c on IPv6 for a while, but their codebase has
diverged; getting the stuff into trunk is going to take effort. At least
the JDK has moved on and should be better

On Wed, Jul 17, 2019 at 6:42 AM Pavithra R  wrote:

> I came across some issues which were fixed for ipv6 support.
>
> But I cant find any documentation that claims, sparks supports ipv6
> completely.
>
>
>
> Hadoop is having a separate jira to work on ipv6 support. Is there any
> such task in spark too?
>
> I would like to know if there is any task planned for ipv6 support of
> Spark?
>
>
>
> Pavithra R
>
>
>
>
>
>
>


Re: Ask for ARM CI for spark

2019-07-17 Thread Sean Owen
On Wed, Jul 17, 2019 at 6:28 AM Tianhua huang  wrote:
> Two failed and the reason is 'Can't find 1 executors before 1 
> milliseconds elapsed', see below, then we try increase timeout the tests 
> passed, so wonder if we can increase the timeout? and here I have another 
> question about 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TestUtils.scala#L285,
>  why is not >=? see the comment of the function, it should be >=?
>

I think it's ">" because the driver is also an executor, but not 100%
sure. In any event it passes in general.
These errors typically mean "I didn't start successfully" for some
other reason that may be in the logs.

> The other two failed and the reason is '2143289344 equaled 2143289344', this 
> because the value of floatToRawIntBits(0.0f/0.0f) on aarch64 platform is 
> 2143289344 and equals to floatToRawIntBits(Float.NaN). About this I send 
> email to jdk-dev and proposed a topic on scala community 
> https://users.scala-lang.org/t/the-value-of-floattorawintbits-0-0f-0-0f-is-different-on-x86-64-and-aarch64-platforms/4845
>  and https://github.com/scala/bug/issues/11632, I thought it's something 
> about jdk or scala, but after discuss, it should related with platform, so 
> seems the following asserts is not appropriate? 
> https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala#L704-L705
>  and 
> https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala#L732-L733

These tests could special-case execution on ARM, like you'll see some
tests handle big-endian architectures.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Ask for ARM CI for spark

2019-07-17 Thread Tianhua huang
Thanks for your reply.

About the first problem we didn't find any other reason in log, just found
timeout to wait the executor up, and after increase the timeout from 1
ms to 3(even 2)ms,
https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L764

https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L792
the test passed, and there are more than one executor up, not sure whether
it's related with the flavor of our aarch64 instance? Now the flavor of the
instance is 8C8G. Maybe we will try the bigger flavor later. Or any one has
other suggestion, please contact me, thank you.

About the second problem, I proposed a pull request to apache/spark,
https://github.com/apache/spark/pull/25186  if you have time, would you
please to help to review it, thank you very much.

On Wed, Jul 17, 2019 at 8:37 PM Sean Owen  wrote:

> On Wed, Jul 17, 2019 at 6:28 AM Tianhua huang 
> wrote:
> > Two failed and the reason is 'Can't find 1 executors before 1
> milliseconds elapsed', see below, then we try increase timeout the tests
> passed, so wonder if we can increase the timeout? and here I have another
> question about
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TestUtils.scala#L285,
> why is not >=? see the comment of the function, it should be >=?
> >
>
> I think it's ">" because the driver is also an executor, but not 100%
> sure. In any event it passes in general.
> These errors typically mean "I didn't start successfully" for some
> other reason that may be in the logs.
>
> > The other two failed and the reason is '2143289344 equaled 2143289344',
> this because the value of floatToRawIntBits(0.0f/0.0f) on aarch64 platform
> is 2143289344 and equals to floatToRawIntBits(Float.NaN). About this I send
> email to jdk-dev and proposed a topic on scala community
> https://users.scala-lang.org/t/the-value-of-floattorawintbits-0-0f-0-0f-is-different-on-x86-64-and-aarch64-platforms/4845
> and https://github.com/scala/bug/issues/11632, I thought it's something
> about jdk or scala, but after discuss, it should related with platform, so
> seems the following asserts is not appropriate?
> https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala#L704-L705
> and
> https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala#L732-L733
>
> These tests could special-case execution on ARM, like you'll see some
> tests handle big-endian architectures.
>