Thanks, Dongjoon, for the discussion. I would like to add Gengliang's work: SPARK-34246 New type coercion syntax rules in ANSI mode I think it is worth describing it in the next release note, too.
Bests, Takeshi On Sat, Feb 27, 2021 at 11:41 AM Yi Wu <yi...@databricks.com> wrote: > +1 to continue the incompleted push-based shuffle. > > -- > Yi > > On Fri, Feb 26, 2021 at 1:26 AM Mridul Muralidharan <mri...@gmail.com> > wrote: > >> >> >> Nit: Java 17 -> should be available by Sept 2021 :-) >> Adoption would also depend on some of our nontrivial dependencies >> supporting it - it might be a stretch to get it in for Apache Spark 3.2 ? >> >> Features: >> Push based shuffle and disaggregated shuffle should also be in 3.2 >> >> >> Regards, >> Mridul >> >> >> >> >> >> >> On Thu, Feb 25, 2021 at 10:57 AM Dongjoon Hyun <dongjoon.h...@gmail.com> >> wrote: >> >>> Hi, All. >>> >>> Since we have been preparing Apache Spark 3.2.0 in master branch since >>> December 2020, March seems to be a good time to share our thoughts and >>> aspirations on Apache Spark 3.2. >>> >>> According to the progress on Apache Spark 3.1 release, Apache Spark 3.2 >>> seems to be the last minor release of this year. Given the timeframe, we >>> might consider the following. (This is a small set. Please add your >>> thoughts to this limited list.) >>> >>> # Languages >>> >>> - Scala 2.13 Support: This was expected on 3.1 via SPARK-25075 but >>> slipped out. Currently, we are trying to use Scala 2.13.5 via SPARK-34505 >>> and investigating the publishing issue. Thank you for your contributions >>> and feedback on this. >>> >>> - Java 17 LTS Support: Java 17 LTS will arrive in September 2017. Like >>> Java 11, we need lots of support from our dependencies. Let's see. >>> >>> - Python 3.6 Deprecation(?): Python 3.6 community support ends at >>> 2021-12-23. So, the deprecation is not required yet, but we had better >>> prepare it because we don't have an ETA of Apache Spark 3.3 in 2022. >>> >>> - SparkR CRAN publishing: As we know, it's discontinued so far. Resuming >>> it depends on the success of Apache SparkR 3.1.1 CRAN publishing. If it >>> succeeds to revive it, we can keep publishing. Otherwise, I believe we had >>> better drop it from the releasing work item list officially. >>> >>> # Dependencies >>> >>> - Apache Hadoop 3.3.2: Hadoop 3.2.0 becomes the default Hadoop profile >>> in Apache Spark 3.1. Currently, Spark master branch lives on Hadoop 3.2.2's >>> shaded clients via SPARK-33212. So far, there is one on-going report at >>> YARN environment. We hope it will be fixed soon at Spark 3.2 timeframe and >>> we can move toward Hadoop 3.3.2. >>> >>> - Apache Hive 2.3.9: Spark 3.0 starts to use Hive 2.3.7 by default >>> instead of old Hive 1.2 fork. Spark 3.1 removed hive-1.2 profile completely >>> via SPARK-32981 and replaced the generated hive-service-rpc code with the >>> official dependency via SPARK-32981. We are steadily improving this area >>> and will consume Hive 2.3.9 if available. >>> >>> - K8s Client 4.13.2: During K8s GA activity, Spark 3.1 upgrades K8s >>> client dependency to 4.12.0. Spark 3.2 upgrades it to 4.13.2 in order to >>> support K8s model 1.19. >>> >>> - Kafka Client 2.8: To bring the client fixes, Spark 3.1 is using Kafka >>> Client 2.6. For Spark 3.2, SPARK-33913 upgraded to Kafka 2.7 with Scala >>> 2.12.13, but it was reverted later due to Scala 2.12.13 issue. Since >>> KAFKA-12357 fixed the Scala requirement two days ago, Spark 3.2 will go >>> with Kafka Client 2.8 hopefully. >>> >>> # Some Features >>> >>> - Data Source v2: Spark 3.2 will deliver much richer DSv2 with Apache >>> Iceberg integration. Especially, we hope the on-going function catalog SPIP >>> and up-coming storage partitioned join SPIP can be delivered as a part of >>> Spark 3.2 and become an additional foundation. >>> >>> - Columnar Encryption: As of today, Apache Spark master branch supports >>> columnar encryption via Apache ORC 1.6 and it's documented via SPARK-34036. >>> Also, upcoming Apache Parquet 1.12 has a similar capability. Hopefully, >>> Apache Spark 3.2 is going to be the first release to have this feature >>> officially. Any feedback is welcome. >>> >>> - Improved ZStandard Support: Spark 3.2 will bring more benefits for >>> ZStandard users: 1) SPARK-34340 added native ZSTD JNI buffer pool support >>> for all IO operations, 2) SPARK-33978 makes ORC datasource support ZSTD >>> compression, 3) SPARK-34503 sets ZSTD as the default codec for event log >>> compression, 4) SPARK-34479 aims to support ZSTD at Avro data source. Also, >>> the upcoming Parquet 1.12 supports ZSTD (and supports JNI buffer pool), >>> too. I'm expecting more benefits. >>> >>> - Structure Streaming with RocksDB backend: According to the latest >>> update, it looks active enough for merging to master branch in Spark 3.2. >>> >>> Please share your thoughts and let's build better Apache Spark 3.2 >>> together. >>> >>> Bests, >>> Dongjoon. >>> >> -- --- Takeshi Yamamuro