I think pluggable storage in shuffle is essential for k8s GA ________________________________ From: Holden Karau <hol...@pigscanfly.ca> Sent: Monday, June 29, 2020 9:33 AM To: Maxim Gekk Cc: Dongjoon Hyun; dev Subject: Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)
Should we also consider the shuffle service refactoring to support pluggable storage engines as targeting the 3.1 release? On Mon, Jun 29, 2020 at 9:31 AM Maxim Gekk <maxim.g...@databricks.com<mailto:maxim.g...@databricks.com>> wrote: Hi Dongjoon, I would add: - Filters pushdown to JSON (https://github.com/apache/spark/pull/27366) - Filters pushdown to other datasources like Avro - Support nested attributes of filters pushed down to JSON Maxim Gekk Software Engineer Databricks, Inc. On Mon, Jun 29, 2020 at 7:07 PM Dongjoon Hyun <dongjoon.h...@gmail.com<mailto:dongjoon.h...@gmail.com>> wrote: Hi, All. After a short celebration of Apache Spark 3.0, I'd like to ask you the community opinion on Apache Spark 3.1 feature expectations. First of all, Apache Spark 3.1 is scheduled for December 2020. - https://spark.apache.org/versioning-policy.html I'm expecting the following items: 1. Support Scala 2.13 2. Use Apache Hadoop 3.2 by default for better cloud support 3. Declaring Kubernetes Scheduler GA In my perspective, the last main missing piece was Dynamic allocation and - Dynamic allocation with shuffle tracking is already shipped at 3.0. - Dynamic allocation with worker decommission/data migration is targeting 3.1. (Thanks, Holden) 4. DSv2 Stabilization I'm aware of some more features which are on the way currently, but I love to hear the opinions from the main developers and more over the main users who need those features. Thank you in advance. Welcome for any comments. Bests, Dongjoon. -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau