I looked into this, because it sure sounds like a similar issue from a few years ago that was fixed in https://issues.apache.org/jira/browse/SPARK-5984
The change in that JIRA actually looks almost identical to the change mentioned in the JDK bug: http://hg.openjdk.java.net/jdk/jdk/rev/3a6d47df8239 Reading the paper http://drops.dagstuhl.de/opus/volltexte/2018/9467/pdf/LIPIcs-ESA-2018-4.pdf in section 5 a little more, I think they are saying that there were two ways to fix the original problem: a) fix the invariant, or b) increase some data structure size. Java did the latter, it seems, and now they've shown it's actually still busted. However Python and the original paper did the former, and we seem to have copied that fix from http://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/ My understanding is that this still works, and is what Java *now* implements. The only difference I can see in implementation is an extra check for a negative array index before dereferencing array[n]. We can add that for full consistency with the JVM change, I suppose. I don't think it's relevant to the problem reported in the paper, but could be an issue otherwise. The JVM implementation suggests it thinks this needs to be guarded. I did hack together a crude translation of the paper's bug reproduction at http://igm.univ-mlv.fr/~pivoteau/Timsort/Test.java by copying some Spark test code. It does need a huge amount of memory to run (>32g.. ended up at 44g) so not so feasible to put in the test suite. Running it on Spark master nets a JVM crash: # Problematic frame: # J 10195 C2 org.apache.spark.util.collection.TimSort.countRunAndMakeAscending(Ljava/lang/Object;IILjava/util/Comparator;)I (198 bytes) @ 0x00007ff64dd9a262 [0x00007ff64dd99f20+0x342] Thats... not good, but I can't tell if it's really due to this same issue or something else going on in the off-heap code. Making the tiny change I mentioned above doesn't do anything. On Fri, Aug 31, 2018 at 2:37 AM Reynold Xin <r...@databricks.com> wrote: > “As a byproduct of our study, we uncover a bug in the Java implementation > that can cause the sorting method to fail during the execution.” > > http://drops.dagstuhl.de/opus/volltexte/2018/9467/ > > This might impact Spark since we took the Java based TimSort > implementation. I have seen in the wild TimSort failing in the past. Maybe > this is the cause. >