A few updates on this thread: We still have a blocking issue, the repartition correctness bug: https://github.com/apache/spark/pull/22112 It's close to merging.
There are a few PRs to fix Scala 2.12 issues. I think they will keep coming up and we don't need to block Spark 2.4 on this. All other features/issues mentioned in this thread are either finished or retargeted to the next release, hopefully we can cut the branch this week. Thanks to everyone for your contributions! Please reply to this email if you think something should be done before Spark 2.4. Thanks, Wenchen On Tue, Aug 14, 2018 at 12:57 AM Xingbo Jiang <jiangxb1...@gmail.com> wrote: > I'm working on the fix of SPARK-23243 > <https://issues.apache.org/jira/browse/SPARK-23243> and should be able > push another commit in 1~2 days. More detailed discussions can go to the PR. > Thanks for pushing this issue forward! I really appreciate efforts by > submit PRs or involve in the discussions actively! > > 2018-08-13 22:50 GMT+08:00 Tom Graves <tgraves...@yahoo.com.invalid>: > >> I agree with Imran, we need to fix SPARK-23243 >> <https://issues.apache.org/jira/browse/SPARK-23243> and any correctness >> issues for that matter. >> >> Tom >> >> On Wednesday, August 8, 2018, 9:06:43 AM CDT, Imran Rashid >> <iras...@cloudera.com.INVALID> wrote: >> >> >> On Tue, Aug 7, 2018 at 8:39 AM, Wenchen Fan <cloud0...@gmail.com> wrote: >> >> SPARK-23243 <https://issues.apache.org/jira/browse/SPARK-23243>: >> Shuffle+Repartition >> on an RDD could lead to incorrect answers >> It turns out to be a very complicated issue, there is no consensus about >> what is the right fix yet. Likely to miss it in Spark 2.4 because it's a >> long-standing issue, not a regression. >> >> >> This is a really serious data loss bug. Yes its very complex, but we >> absolutely have to fix this, I really think it should be in 2.4. >> Has worked on it stopped? >> > >