Github user advancedxy commented on the pull request:
https://github.com/apache/spark/pull/4783#issuecomment-78435831
@shivaram The reason why ExternalSorter failed is that It doesn't spilling
files for these two failure tests.
( If we increase the input from `0 until 100000` to `0 until 200000`, It
will spilling files to disks, which passes the tests) However the input type
for sorter is Iterator[(Int, Int)], and the older SizeEstimator gives 32 bytes
and the new SizeEstimator gives the same 32 bytes for (Int, Int) (64 bit JVM
with UseCompressedOops on assumed). So, It's very wried to see different
results.
Seems @mateiz is busy. @jerryshao cloud you take a look at the failed
tests, since you wrote some of the tests.
And, I believe there is another bug in the current SizeEstimator, Scala
specialized Int, Long, Float, Double for Tuples, So, The size of (Int, Int)
should be 24 bytes, rather than 32bytes. Verified by the method introduced by
this article
http://www.javaworld.com/article/2077496/testing-debugging/java-tip-130--do-you-know-your-data-size-.html.
I will take a look at the size problem.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]