Re: Ask for ARM CI for spark
Hi all, We run all unit tests for spark on arm64 platform, after effort there are four tests FAILED, see https://logs.openlabtesting.org/logs/4/4/ae5ebaddd6ba6eba5a525b2bf757043ebbe78432/check/spark-build-arm64/9ecccad/job-output.txt.gz Two failed and the reason is 'Can't find 1 executors before 1 milliseconds elapsed', see below, then we try increase timeout the tests passed, so wonder if we can increase the timeout? and here I have another question about https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TestUtils.scala#L285, why is not >=? see the comment of the function, it should be >=? - test driver discovery under local-cluster mode *** FAILED *** java.util.concurrent.TimeoutException: Can't find 1 executors before 1 milliseconds elapsed at org.apache.spark.TestUtils$.waitUntilExecutorsUp(TestUtils.scala:293) at org.apache.spark.SparkContextSuite.$anonfun$new$78(SparkContextSuite.scala:753) at org.apache.spark.SparkContextSuite.$anonfun$new$78$adapted(SparkContextSuite.scala:741) at org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:161) at org.apache.spark.SparkContextSuite.$anonfun$new$77(SparkContextSuite.scala:741) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) - test gpu driver resource files and discovery under local-cluster mode *** FAILED *** java.util.concurrent.TimeoutException: Can't find 1 executors before 1 milliseconds elapsed at org.apache.spark.TestUtils$.waitUntilExecutorsUp(TestUtils.scala:293) at org.apache.spark.SparkContextSuite.$anonfun$new$80(SparkContextSuite.scala:781) at org.apache.spark.SparkContextSuite.$anonfun$new$80$adapted(SparkContextSuite.scala:761) at org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:161) at org.apache.spark.SparkContextSuite.$anonfun$new$79(SparkContextSuite.scala:761) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) The other two failed and the reason is '2143289344 equaled 2143289344', this because the value of floatToRawIntBits(0.0f/0.0f) on aarch64 platform is 2143289344 and equals to floatToRawIntBits(Float.NaN). About this I send email to jdk-dev and proposed a topic on scala community https://users.scala-lang.org/t/the-value-of-floattorawintbits-0-0f-0-0f-is-different-on-x86-64-and-aarch64-platforms/4845 and https://github.com/scala/bug/issues/11632, I thought it's something about jdk or scala, but after discuss, it should related with platform, so seems the following asserts is not appropriate? https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala#L704-L705 and https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala#L732-L733 - SPARK-26021: NaN and -0.0 in grouping expressions *** FAILED *** 2143289344 equaled 2143289344 (DataFrameAggregateSuite.scala:732) - NaN and -0.0 in window partition keys *** FAILED *** 2143289344 equaled 2143289344 (DataFrameWindowFunctionsSuite.scala:704) About the failed tests fixing, we are waiting for your suggestions, thank you very much. On Wed, Jul 10, 2019 at 10:07 AM Tianhua huang wrote: > Hi all, > > I am glad to tell you there is a new progress of build/test spark on > aarch64 server, the tests are running, see the build/test detail log > https://logs.openlabtesting.org/logs/1/1/419fcb11764048d5a3cda186ea76dd43249e1f97/check/spark-build-arm64/75cc6f5/job-output.txt.gz > and > the aarch64 instance info see > https://logs.openlabtesting.org/logs/1/1/419fcb11764048d5a3cda186ea76dd43249e1f97/check/spark-build-arm64/75cc6f5/zuul-info/zuul-info.ubuntu-xenial-arm64.txt > In > order to enable the test, I made some modification, the major one is to > build leveldbjni local package, I forked fusesource/leveldbjni and > chirino/leveldb repos, and made some modification to make sure to build the > local package, see https://github.com/huangtianhua/leveldbjni/pull/1 and > https://github.com/huangtianhua/leveldbjni/pull/2 , then to use it in > spark, the detail you can find in > https://github.com/theopenlab/spark/pull/1 > > Now the tests are not all successful, I will try to fix it and any > suggestion is welcome, thank you all. > > On Mon, Jul 1, 2019 at 5:25 PM Tianhua huang > wrote: > >> We are focus on the arm instance of cloud, and now I use the arm instance >> of vexxhost cloud to run the build job which mentioned above, the >> specificat
Re: IPv6 support
Fairly neglected hadoop patch, FWIW; https://issues.apache.org/jira/browse/HADOOP-11890 FB have been running HDFS &c on IPv6 for a while, but their codebase has diverged; getting the stuff into trunk is going to take effort. At least the JDK has moved on and should be better On Wed, Jul 17, 2019 at 6:42 AM Pavithra R wrote: > I came across some issues which were fixed for ipv6 support. > > But I cant find any documentation that claims, sparks supports ipv6 > completely. > > > > Hadoop is having a separate jira to work on ipv6 support. Is there any > such task in spark too? > > I would like to know if there is any task planned for ipv6 support of > Spark? > > > > Pavithra R > > > > > > >
Re: Ask for ARM CI for spark
On Wed, Jul 17, 2019 at 6:28 AM Tianhua huang wrote: > Two failed and the reason is 'Can't find 1 executors before 1 > milliseconds elapsed', see below, then we try increase timeout the tests > passed, so wonder if we can increase the timeout? and here I have another > question about > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TestUtils.scala#L285, > why is not >=? see the comment of the function, it should be >=? > I think it's ">" because the driver is also an executor, but not 100% sure. In any event it passes in general. These errors typically mean "I didn't start successfully" for some other reason that may be in the logs. > The other two failed and the reason is '2143289344 equaled 2143289344', this > because the value of floatToRawIntBits(0.0f/0.0f) on aarch64 platform is > 2143289344 and equals to floatToRawIntBits(Float.NaN). About this I send > email to jdk-dev and proposed a topic on scala community > https://users.scala-lang.org/t/the-value-of-floattorawintbits-0-0f-0-0f-is-different-on-x86-64-and-aarch64-platforms/4845 > and https://github.com/scala/bug/issues/11632, I thought it's something > about jdk or scala, but after discuss, it should related with platform, so > seems the following asserts is not appropriate? > https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala#L704-L705 > and > https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala#L732-L733 These tests could special-case execution on ARM, like you'll see some tests handle big-endian architectures. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Ask for ARM CI for spark
Thanks for your reply. About the first problem we didn't find any other reason in log, just found timeout to wait the executor up, and after increase the timeout from 1 ms to 3(even 2)ms, https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L764 https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L792 the test passed, and there are more than one executor up, not sure whether it's related with the flavor of our aarch64 instance? Now the flavor of the instance is 8C8G. Maybe we will try the bigger flavor later. Or any one has other suggestion, please contact me, thank you. About the second problem, I proposed a pull request to apache/spark, https://github.com/apache/spark/pull/25186 if you have time, would you please to help to review it, thank you very much. On Wed, Jul 17, 2019 at 8:37 PM Sean Owen wrote: > On Wed, Jul 17, 2019 at 6:28 AM Tianhua huang > wrote: > > Two failed and the reason is 'Can't find 1 executors before 1 > milliseconds elapsed', see below, then we try increase timeout the tests > passed, so wonder if we can increase the timeout? and here I have another > question about > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TestUtils.scala#L285, > why is not >=? see the comment of the function, it should be >=? > > > > I think it's ">" because the driver is also an executor, but not 100% > sure. In any event it passes in general. > These errors typically mean "I didn't start successfully" for some > other reason that may be in the logs. > > > The other two failed and the reason is '2143289344 equaled 2143289344', > this because the value of floatToRawIntBits(0.0f/0.0f) on aarch64 platform > is 2143289344 and equals to floatToRawIntBits(Float.NaN). About this I send > email to jdk-dev and proposed a topic on scala community > https://users.scala-lang.org/t/the-value-of-floattorawintbits-0-0f-0-0f-is-different-on-x86-64-and-aarch64-platforms/4845 > and https://github.com/scala/bug/issues/11632, I thought it's something > about jdk or scala, but after discuss, it should related with platform, so > seems the following asserts is not appropriate? > https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala#L704-L705 > and > https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala#L732-L733 > > These tests could special-case execution on ARM, like you'll see some > tests handle big-endian architectures. >