Re: Ask for ARM CI for spark

Sean Owen Fri, 26 Jul 2019 02:47:14 -0700

Interesting. I don't think log(3) is special, it's just that some
differences in how it's implemented and floating-point values on
aarch64 vs x86, or in the JVM, manifest at some values like this. It's
still a little surprising! BTW Wolfram Alpha suggests that the correct
value is more like ...810969..., right between the two. java.lang.Math
doesn't guarantee strict IEEE floating-point behavior, but
java.lang.StrictMath is supposed to, at the potential cost of speed,
and it gives ...81096, in agreement with aarch64.


@Yuming Wang the results in float8.sql are from PostgreSQL directly?
Interesting if it also returns the same less accurate result, which
might suggest it's more to do with underlying OS math libraries. You
noted that these tests sometimes gave platform-dependent differences
in the last digit, so wondering if the test value directly reflects
PostgreSQL or just what we happen to return now.

One option is to use StrictMath in special cases like computing atanh.
That gives a value that agrees with aarch64.
I also note that 0.5 * (math.log(1 + x) - math.log(1 - x) gives the
more accurate answer too, and makes the result agree with, say,
Wolfram Alpha for atanh(0.5).
(Actually if we do that, better still is 0.5 * (math.log1p(x) -
math.log1p(-x)) for best accuracy near 0)
Commons Math also has implementations of sinh, cosh, atanh that we
could call. It claims it's possibly more accurate and faster. I
haven't tested its result here.

FWIW the "log1p" version appears, from some informal testing, to be
most accurate (in agreement with Wolfram) and using StrictMath doesn't
matter. If we change something, I'd use that version above.
The only issue is if this causes the result to disagree with
PostgreSQL, but then again it's more correct and maybe the DB is
wrong.


The rest may be a test vs PostgreSQL issue; see
https://issues.apache.org/jira/browse/SPARK-28316


On Fri, Jul 26, 2019 at 2:32 AM Tianhua huang <huangtianhua...@gmail.com> wrote:
>
> Hi, all
>
>
> Sorry to disturb again, there are several sql tests failed on arm64 instance:
>
> pgSQL/float8.sql *** FAILED ***
> Expected "0.549306144334054[9]", but got "0.549306144334054[8]" Result did 
> not match for query #56
> SELECT atanh(double('0.5')) (SQLQueryTestSuite.scala:362)
> pgSQL/numeric.sql *** FAILED ***
> Expected "2 2247902679199174[72 224790267919917955.1326161858
> 4 7405685069595001 7405685069594999.0773399947
> 5 5068226527.321263 5068226527.3212726541
> 6 281839893606.99365 281839893606.9937234336
> 7 1716699575118595840 1716699575118597095.4233081991
> 8 167361463828.0749 167361463828.0749132007
> 9 107511333880051856] 107511333880052007....", but got "2 
> 2247902679199174[40224790267919917955.1326161858
> 4 7405685069595001 7405685069594999.0773399947
> 5 5068226527.321263 5068226527.3212726541
> 6 281839893606.99365 281839893606.9937234336
> 7 1716699575118595580 1716699575118597095.4233081991
> 8 167361463828.0749 167361463828.0749132007
> 9 107511333880051872] 107511333880052007...." Result did not match for query 
> #496
> SELECT t1.id1, t1.result, t2.expected
> FROM num_result t1, num_exp_power_10_ln t2
> WHERE t1.id1 = t2.id
> AND t1.result != t2.expected (SQLQueryTestSuite.scala:362)
>
> The first test failed, because the value of math.log(3.0) is different on 
> aarch64:
>
> # on x86_64:
>
> scala> val a = 0.5
> a: Double = 0.5
>
> scala> a * math.log((1.0 + a) / (1.0 - a))
> res1: Double = 0.5493061443340549
>
> scala> math.log((1.0 + a) / (1.0 - a))
> res2: Double = 1.0986122886681098
>
> # on aarch64:
>
> scala> val a = 0.5
>
> a: Double = 0.5
>
> scala> a * math.log((1.0 + a) / (1.0 - a))
>
> res20: Double = 0.5493061443340548
>
> scala> math.log((1.0 + a) / (1.0 - a))
>
> res21: Double = 1.0986122886681096
>
> And I tried other several numbers like math.log(4.0) and math.log(5.0) and 
> they are same, I don't know why math.log(3.0) is so special? But the result 
> is different indeed on aarch64. If you are interesting, please try it.
>
> The second test failed, because some values of pow(10, x) is different on 
> aarch64, according to sql tests of spark, I took similar tests on aarch64 and 
> x86_64, take '-83028485' as example:
>
> # on x86_64:
> scala> import java.lang.Math._
> import java.lang.Math._
> scala> var a = -83028485
> a: Int = -83028485
> scala> abs(a)
> res4: Int = 83028485
> scala> math.log(abs(a))
> res5: Double = 18.234694299654787
> scala> pow(10, math.log(abs(a)))
> res6: Double = 1.71669957511859584E18
>
> # on aarch64:
>
> scala> var a = -83028485
> a: Int = -83028485
> scala> abs(a)
> res38: Int = 83028485
>
> scala> math.log(abs(a))
>
> res39: Double = 18.234694299654787
> scala> pow(10, math.log(abs(a)))
> res40: Double = 1.71669957511859558E18
>
> I send an email to jdk-dev, hope someone can help, and also I proposed this 
> to JIRA  https://issues.apache.org/jira/browse/SPARK-28519, , if you are 
> interesting, welcome to join and discuss, thank you very much.
>
>
> On Thu, Jul 18, 2019 at 11:12 AM Tianhua huang <huangtianhua...@gmail.com> 
> wrote:
>>
>> Thanks for your reply.
>>
>> About the first problem we didn't find any other reason in log, just found 
>> timeout to wait the executor up, and after increase the timeout from 10000 
>> ms to 30000(even 20000)ms, 
>> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L764
>>   
>> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L792
>>   the test passed, and there are more than one executor up, not sure whether 
>> it's related with the flavor of our aarch64 instance? Now the flavor of the 
>> instance is 8C8G. Maybe we will try the bigger flavor later. Or any one has 
>> other suggestion, please contact me, thank you.
>>
>> About the second problem, I proposed a pull request to apache/spark, 
>> https://github.com/apache/spark/pull/25186  if you have time, would you 
>> please to help to review it, thank you very much.
>>
>> On Wed, Jul 17, 2019 at 8:37 PM Sean Owen <sro...@gmail.com> wrote:
>>>
>>> On Wed, Jul 17, 2019 at 6:28 AM Tianhua huang <huangtianhua...@gmail.com> 
>>> wrote:
>>> > Two failed and the reason is 'Can't find 1 executors before 10000 
>>> > milliseconds elapsed', see below, then we try increase timeout the tests 
>>> > passed, so wonder if we can increase the timeout? and here I have another 
>>> > question about 
>>> > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TestUtils.scala#L285,
>>> >  why is not >=? see the comment of the function, it should be >=?
>>> >
>>>
>>> I think it's ">" because the driver is also an executor, but not 100%
>>> sure. In any event it passes in general.
>>> These errors typically mean "I didn't start successfully" for some
>>> other reason that may be in the logs.
>>>
>>> > The other two failed and the reason is '2143289344 equaled 2143289344', 
>>> > this because the value of floatToRawIntBits(0.0f/0.0f) on aarch64 
>>> > platform is 2143289344 and equals to floatToRawIntBits(Float.NaN). About 
>>> > this I send email to jdk-dev and proposed a topic on scala community 
>>> > https://users.scala-lang.org/t/the-value-of-floattorawintbits-0-0f-0-0f-is-different-on-x86-64-and-aarch64-platforms/4845
>>> >  and https://github.com/scala/bug/issues/11632, I thought it's something 
>>> > about jdk or scala, but after discuss, it should related with platform, 
>>> > so seems the following asserts is not appropriate? 
>>> > https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala#L704-L705
>>> >  and 
>>> > https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala#L732-L733
>>>
>>> These tests could special-case execution on ARM, like you'll see some
>>> tests handle big-endian architectures.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Ask for ARM CI for spark

Reply via email to