I agree, I think we just need to go through all of them and individual assess each one. If it's really a correctness issue we should hold 3.0 for it. On the 2.4 release I didn't see an explanation on https://issues.apache.org/jira/browse/SPARK-26154 why it can't be back ported, I think in the very least we need that in each jira comment. spark-29701 looks more like compatibility with Postgres then a purely wrong answer to me, if Spark has been consistent about that it feels like it can wait for 3.0 but would be good to get others input and I'm not an expert on SQL standard and what do the other sql engines do in this case. Tom On Monday, January 20, 2020, 12:07:54 AM CST, Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: Hi, All. According to our policy, "Correctness and data loss issues should be considered Blockers".
- http://spark.apache.org/contributing.html Since we are close to branch-3.0 cut, I want to ask your opinions on the following correctness and data loss issues. SPARK-30218 Columns used in inequality conditions for joins not resolved correctly in case of common lineage SPARK-29701 Different answers when empty input given in GROUPING SETS SPARK-29699 Different answers in nested aggregates with window functions SPARK-29419 Seq.toDS / spark.createDataset(Seq) is not thread-safe SPARK-28125 dataframes created by randomSplit have overlapping rows SPARK-28067 Incorrect results in decimal aggregation with whole-stage code gen enabled SPARK-28024 Incorrect numeric values when out of range SPARK-27784 Alias ID reuse can break correctness when substituting foldable expressions SPARK-27619 MapType should be prohibited in hash expressions SPARK-27298 Dataset except operation gives different results(dataset count) on Spark 2.3.0 Windows and Spark 2.3.0 Linux environment SPARK-27282 Spark incorrect results when using UNION with GROUP BY clause SPARK-27213 Unexpected results when filter is used after distinct SPARK-26836 Columns get switched in Spark SQL using Avro backed Hive table if schema evolves SPARK-25150 Joining DataFrames derived from the same source yields confusing/incorrect results SPARK-21774 The rule PromoteStrings cast string to a wrong data type SPARK-19248 Regex_replace works in 1.6 but not in 2.0 Some of them are targeted on 3.0.0, but the others are not. Although we will work on them until 3.0.0,I'm not sure we can reach a status with no known correctness and data loss issue. How do you think about the above issues? Bests,Dongjoon.