Spark 3.0 preview release on-going features discussion

Xingbo Jiang Fri, 20 Sep 2019 00:46:35 -0700

Hi all,

Let's start a new thread to discuss the on-going features for Spark 3.0
preview release.


Below is the feature list for the Spark 3.0 preview release. The list is
collected from the previous discussions in the dev list.

   - Followup of the shuffle+repartition correctness issue: support roll
   back shuffle stages (https://issues.apache.org/jira/browse/SPARK-25341)
   - Upgrade the built-in Hive to 2.3.5 for hadoop-3.2 (
   https://issues.apache.org/jira/browse/SPARK-23710)
   - JDK 11 support (https://issues.apache.org/jira/browse/SPARK-28684)
   - Scala 2.13 support (https://issues.apache.org/jira/browse/SPARK-25075)
   - DataSourceV2 features
      - Enable file source v2 writers (
      https://issues.apache.org/jira/browse/SPARK-27589)
      - CREATE TABLE USING with DataSourceV2
      - New pushdown API for DataSourceV2
      - Support DELETE/UPDATE/MERGE Operations in DataSourceV2 (
      https://issues.apache.org/jira/browse/SPARK-28303)
   - Correctness issue: Stream-stream joins - left outer join gives
   inconsistent output (https://issues.apache.org/jira/browse/SPARK-26154)
   - Revisiting Python / pandas UDF (
   https://issues.apache.org/jira/browse/SPARK-28264)
   - Spark Graph (https://issues.apache.org/jira/browse/SPARK-25994)

Features that are nice to have:

   - Use remote storage for persisting shuffle data (
   https://issues.apache.org/jira/browse/SPARK-25299)
   - Spark + Hadoop + Parquet + Avro compatibility problems (
   https://issues.apache.org/jira/browse/SPARK-25588)
   - Introduce new option to Kafka source - specify timestamp to start and
   end offset (https://issues.apache.org/jira/browse/SPARK-26848)
   - Delete files after processing in structured streaming (
   https://issues.apache.org/jira/browse/SPARK-20568)

Here, I am proposing to cut the branch on October 15th. If the features are
targeting to 3.0 preview release, please prioritize the work and finish it
before the date. Note, Oct. 15th is not the code freeze of Spark 3.0. That
means, the community will still work on the features for the upcoming Spark
3.0 release, even if they are not included in the preview release. The goal
of preview release is to collect more feedback from the community regarding
the new 3.0 features/behavior changes.

Thanks!

Spark 3.0 preview release on-going features discussion

Reply via email to