Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

Felix Cheung Sun, 05 Jul 2020 12:21:27 -0700

I think pluggable storage in shuffle is essential for k8s GA

________________________________
From: Holden Karau <hol...@pigscanfly.ca>
Sent: Monday, June 29, 2020 9:33 AM
To: Maxim Gekk
Cc: Dongjoon Hyun; dev
Subject: Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)


Should we also consider the shuffle service refactoring to support pluggable 
storage engines as targeting the 3.1 release?

On Mon, Jun 29, 2020 at 9:31 AM Maxim Gekk 
<maxim.g...@databricks.com<mailto:maxim.g...@databricks.com>> wrote:
Hi Dongjoon,

I would add:
- Filters pushdown to JSON (https://github.com/apache/spark/pull/27366)
- Filters pushdown to other datasources like Avro
- Support nested attributes of filters pushed down to JSON

Maxim Gekk

Software Engineer

Databricks, Inc.


On Mon, Jun 29, 2020 at 7:07 PM Dongjoon Hyun 
<dongjoon.h...@gmail.com<mailto:dongjoon.h...@gmail.com>> wrote:
Hi, All.

After a short celebration of Apache Spark 3.0, I'd like to ask you the 
community opinion on Apache Spark 3.1 feature expectations.

First of all, Apache Spark 3.1 is scheduled for December 2020.
- https://spark.apache.org/versioning-policy.html

I'm expecting the following items:

1. Support Scala 2.13
2. Use Apache Hadoop 3.2 by default for better cloud support
3. Declaring Kubernetes Scheduler GA
    In my perspective, the last main missing piece was Dynamic allocation and
    - Dynamic allocation with shuffle tracking is already shipped at 3.0.
    - Dynamic allocation with worker decommission/data migration is targeting 
3.1. (Thanks, Holden)
4. DSv2 Stabilization

I'm aware of some more features which are on the way currently, but I love to 
hear the opinions from the main developers and more over the main users who 
need those features.

Thank you in advance. Welcome for any comments.

Bests,
Dongjoon.


--
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
<https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

Reply via email to