Thank you for sharing your opinions, Jacky, Maxim, Holden, Jungtaek, Yi, Tom, Gabor, Felix.
I also want to include both `New Features` and `Improvements` together according to the above discussion. When I checked the item status as of today, it looked like the following. In short, I removed K8s GA and DSv2 Stabilization explicitly from ON-TRACK list according to the given concerns. For those items, we can try to build a consensus for Apache Spark 3.2 (June 2021) or later. ON-TRACK 1. Support Scala 2.13 (SPARK-25075) 2. Use Apache Hadoop 3.2 by default for better cloud support (SPARK-32058) 3. Stage Level Scheduling (SPARK-27495) 4. Support filter pushdown more (CSV is already shipped by SPARK-30323 in 3.0) - Support filters pushdown to JSON (SPARK-30648 in 3.1) - Support filters pushdown to Avro (SPARK-XXX in 3.1) - Support nested attributes of filters pushed down to JSON 5. Support JDBC Kerberos w/ keytab (SPARK-12312) NICE TO HAVE OR DEFERRED TO APACHE SPARK 3.2 1. Declaring Kubernetes Scheduler GA - Should we also consider the shuffle service refactoring to support pluggable storage engines as targeting the 3.1 release? (Holden) - I think pluggable storage in shuffle is essential for k8s GA (Felix) - Use remote storage for persisting shuffle data (SPARK-25299) 2. DSv2 Stabilization? (The followings and more) - SPARK-31357 Catalog API for view metadata - SPARK-31694 Add SupportsPartitions Catalog APIs on DataSourceV2 As we know, we work willingly and voluntarily. If something lands on the `master` branch before the feature freeze (November), it will be a part of Apache Spark 3.1, of course. Thanks, Dongjoon. On Sun, Jul 5, 2020 at 12:21 PM Felix Cheung <felixcheun...@hotmail.com> wrote: > I think pluggable storage in shuffle is essential for k8s GA > > ------------------------------ > *From:* Holden Karau <hol...@pigscanfly.ca> > *Sent:* Monday, June 29, 2020 9:33 AM > *To:* Maxim Gekk > *Cc:* Dongjoon Hyun; dev > *Subject:* Re: Apache Spark 3.1 Feature Expectation (Dec. 2020) > > Should we also consider the shuffle service refactoring to support > pluggable storage engines as targeting the 3.1 release? > > On Mon, Jun 29, 2020 at 9:31 AM Maxim Gekk <maxim.g...@databricks.com> > wrote: > >> Hi Dongjoon, >> >> I would add: >> - Filters pushdown to JSON (https://github.com/apache/spark/pull/27366) >> - Filters pushdown to other datasources like Avro >> - Support nested attributes of filters pushed down to JSON >> >> Maxim Gekk >> >> Software Engineer >> >> Databricks, Inc. >> >> >> On Mon, Jun 29, 2020 at 7:07 PM Dongjoon Hyun <dongjoon.h...@gmail.com> >> wrote: >> >>> Hi, All. >>> >>> After a short celebration of Apache Spark 3.0, I'd like to ask you the >>> community opinion on Apache Spark 3.1 feature expectations. >>> >>> First of all, Apache Spark 3.1 is scheduled for December 2020. >>> - https://spark.apache.org/versioning-policy.html >>> >>> I'm expecting the following items: >>> >>> 1. Support Scala 2.13 >>> 2. Use Apache Hadoop 3.2 by default for better cloud support >>> 3. Declaring Kubernetes Scheduler GA >>> In my perspective, the last main missing piece was Dynamic >>> allocation and >>> - Dynamic allocation with shuffle tracking is already shipped at 3.0. >>> - Dynamic allocation with worker decommission/data migration is >>> targeting 3.1. (Thanks, Holden) >>> 4. DSv2 Stabilization >>> >>> I'm aware of some more features which are on the way currently, but I >>> love to hear the opinions from the main developers and more over the main >>> users who need those features. >>> >>> Thank you in advance. Welcome for any comments. >>> >>> Bests, >>> Dongjoon. >>> >> > > -- > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau >