Thanks, Hyukjin. The expected target branch cut date of Spark 3.2 is *July 1st* on https://spark.apache.org/versioning-policy.html. However, I notice that there are still multiple important projects in progress now:
[Core] - SPIP: Support push-based shuffle to improve shuffle efficiency <https://issues.apache.org/jira/browse/SPARK-30602> [SQL] - Support ANSI SQL INTERVAL types <https://issues.apache.org/jira/browse/SPARK-27790> - Support Timestamp without time zone data type <https://issues.apache.org/jira/browse/SPARK-35662> - Aggregate (Min/Max/Count) push down for Parquet <https://issues.apache.org/jira/browse/SPARK-34952> [Streaming] - EventTime based sessionization (session window) <https://issues.apache.org/jira/browse/SPARK-10816> - Add RocksDB StateStore as external module <https://issues.apache.org/jira/browse/SPARK-34198> I wonder whether we should postpone the branch cut date. cc Min Shen, Yi Wu, Max Gekk, Huaxin Gao, Jungtaek Lim, Yuanjian Li, Liang-Chi Hsieh, who work on the projects above. On Tue, Jun 15, 2021 at 4:34 PM Hyukjin Kwon <gurwls...@gmail.com> wrote: > +1, thanks. > > On Tue, 15 Jun 2021, 16:17 Gengliang Wang, <ltn...@gmail.com> wrote: > >> Hi, >> >> As the expected release date is close, I would like to volunteer as the >> release manager for Apache Spark 3.2.0. >> >> Thanks, >> Gengliang >> >> On Mon, Apr 12, 2021 at 1:59 PM Wenchen Fan <cloud0...@gmail.com> wrote: >> >>> An update: we found a mistake that we picked the Spark 3.2 release date >>> based on the scheduled release date of 3.1. However, 3.1 was delayed and >>> released on March 2. In order to have a full 6 months development for 3.2, >>> the target release date for 3.2 should be September 2. >>> >>> I'm updating the release dates in >>> https://github.com/apache/spark-website/pull/331 >>> >>> Thanks, >>> Wenchen >>> >>> On Thu, Mar 11, 2021 at 11:17 PM Dongjoon Hyun <dongjoon.h...@gmail.com> >>> wrote: >>> >>>> Thank you, Xiao, Wenchen and Hyukjin. >>>> >>>> Bests, >>>> Dongjoon. >>>> >>>> >>>> On Thu, Mar 11, 2021 at 2:15 AM Hyukjin Kwon <gurwls...@gmail.com> >>>> wrote: >>>> >>>>> Just for an update, I will send a discussion email about my idea late >>>>> this week or early next week. >>>>> >>>>> 2021년 3월 11일 (목) 오후 7:00, Wenchen Fan <cloud0...@gmail.com>님이 작성: >>>>> >>>>>> There are many projects going on right now, such as new DS v2 APIs, >>>>>> ANSI interval types, join improvement, disaggregated shuffle, etc. I >>>>>> don't >>>>>> think it's realistic to do the branch cut in April. >>>>>> >>>>>> I'm +1 to release 3.2 around July, but it doesn't mean we have to cut >>>>>> the branch 3 months earlier. We should make the release process faster >>>>>> and >>>>>> cut the branch around June probably. >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Mar 11, 2021 at 4:41 AM Xiao Li <gatorsm...@gmail.com> wrote: >>>>>> >>>>>>> Below are some nice-to-have features we can work on in Spark 3.2: >>>>>>> Lateral >>>>>>> Join support <https://issues.apache.org/jira/browse/SPARK-28379>, >>>>>>> interval data type, timestamp without time zone, un-nesting arbitrary >>>>>>> queries, the returned metrics of DSV2, and error message >>>>>>> standardization. >>>>>>> Spark 3.2 will be another exciting release I believe! >>>>>>> >>>>>>> Go Spark! >>>>>>> >>>>>>> Xiao >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Dongjoon Hyun <dongjoon.h...@gmail.com> 于2021年3月10日周三 下午12:25写道: >>>>>>> >>>>>>>> Hi, Xiao. >>>>>>>> >>>>>>>> This thread started 13 days ago. Since you asked the community >>>>>>>> about major features or timelines at that time, could you share your >>>>>>>> roadmap or expectations if you have something in your mind? >>>>>>>> >>>>>>>> > Thank you, Dongjoon, for initiating this discussion. Let us keep >>>>>>>> it open. It might take 1-2 weeks to collect from the community all the >>>>>>>> features we plan to build and ship in 3.2 since we just finished the >>>>>>>> 3.1 >>>>>>>> voting. >>>>>>>> > TBH, cutting the branch this April does not look good to me. That >>>>>>>> means, we only have one month left for feature development of Spark >>>>>>>> 3.2. Do >>>>>>>> we have enough features in the current master branch? If not, are we >>>>>>>> able >>>>>>>> to finish major features we collected here? Do they have a timeline or >>>>>>>> project plan? >>>>>>>> >>>>>>>> Bests, >>>>>>>> Dongjoon. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Mar 3, 2021 at 2:58 PM Dongjoon Hyun < >>>>>>>> dongjoon.h...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi, John. >>>>>>>>> >>>>>>>>> This thread aims to share your expectations and goals (and maybe >>>>>>>>> work progress) to Apache Spark 3.2 because we are making this >>>>>>>>> together. :) >>>>>>>>> >>>>>>>>> Bests, >>>>>>>>> Dongjoon. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Mar 3, 2021 at 1:59 PM John Zhuge <jzh...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Dongjoon, >>>>>>>>>> >>>>>>>>>> Is it possible to get ViewCatalog in? The community already had >>>>>>>>>> fairly detailed discussions. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> John >>>>>>>>>> >>>>>>>>>> On Thu, Feb 25, 2021 at 8:57 AM Dongjoon Hyun < >>>>>>>>>> dongjoon.h...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, All. >>>>>>>>>>> >>>>>>>>>>> Since we have been preparing Apache Spark 3.2.0 in master branch >>>>>>>>>>> since December 2020, March seems to be a good time to share our >>>>>>>>>>> thoughts >>>>>>>>>>> and aspirations on Apache Spark 3.2. >>>>>>>>>>> >>>>>>>>>>> According to the progress on Apache Spark 3.1 release, Apache >>>>>>>>>>> Spark 3.2 seems to be the last minor release of this year. Given the >>>>>>>>>>> timeframe, we might consider the following. (This is a small set. >>>>>>>>>>> Please >>>>>>>>>>> add your thoughts to this limited list.) >>>>>>>>>>> >>>>>>>>>>> # Languages >>>>>>>>>>> >>>>>>>>>>> - Scala 2.13 Support: This was expected on 3.1 via SPARK-25075 >>>>>>>>>>> but slipped out. Currently, we are trying to use Scala 2.13.5 via >>>>>>>>>>> SPARK-34505 and investigating the publishing issue. Thank you for >>>>>>>>>>> your >>>>>>>>>>> contributions and feedback on this. >>>>>>>>>>> >>>>>>>>>>> - Java 17 LTS Support: Java 17 LTS will arrive in September >>>>>>>>>>> 2017. Like Java 11, we need lots of support from our dependencies. >>>>>>>>>>> Let's >>>>>>>>>>> see. >>>>>>>>>>> >>>>>>>>>>> - Python 3.6 Deprecation(?): Python 3.6 community support ends >>>>>>>>>>> at 2021-12-23. So, the deprecation is not required yet, but we had >>>>>>>>>>> better >>>>>>>>>>> prepare it because we don't have an ETA of Apache Spark 3.3 in 2022. >>>>>>>>>>> >>>>>>>>>>> - SparkR CRAN publishing: As we know, it's discontinued so far. >>>>>>>>>>> Resuming it depends on the success of Apache SparkR 3.1.1 CRAN >>>>>>>>>>> publishing. >>>>>>>>>>> If it succeeds to revive it, we can keep publishing. Otherwise, I >>>>>>>>>>> believe >>>>>>>>>>> we had better drop it from the releasing work item list officially. >>>>>>>>>>> >>>>>>>>>>> # Dependencies >>>>>>>>>>> >>>>>>>>>>> - Apache Hadoop 3.3.2: Hadoop 3.2.0 becomes the default Hadoop >>>>>>>>>>> profile in Apache Spark 3.1. Currently, Spark master branch lives >>>>>>>>>>> on Hadoop >>>>>>>>>>> 3.2.2's shaded clients via SPARK-33212. So far, there is one >>>>>>>>>>> on-going >>>>>>>>>>> report at YARN environment. We hope it will be fixed soon at Spark >>>>>>>>>>> 3.2 >>>>>>>>>>> timeframe and we can move toward Hadoop 3.3.2. >>>>>>>>>>> >>>>>>>>>>> - Apache Hive 2.3.9: Spark 3.0 starts to use Hive 2.3.7 by >>>>>>>>>>> default instead of old Hive 1.2 fork. Spark 3.1 removed hive-1.2 >>>>>>>>>>> profile >>>>>>>>>>> completely via SPARK-32981 and replaced the generated >>>>>>>>>>> hive-service-rpc code >>>>>>>>>>> with the official dependency via SPARK-32981. We are steadily >>>>>>>>>>> improving >>>>>>>>>>> this area and will consume Hive 2.3.9 if available. >>>>>>>>>>> >>>>>>>>>>> - K8s Client 4.13.2: During K8s GA activity, Spark 3.1 upgrades >>>>>>>>>>> K8s client dependency to 4.12.0. Spark 3.2 upgrades it to 4.13.2 in >>>>>>>>>>> order >>>>>>>>>>> to support K8s model 1.19. >>>>>>>>>>> >>>>>>>>>>> - Kafka Client 2.8: To bring the client fixes, Spark 3.1 is >>>>>>>>>>> using Kafka Client 2.6. For Spark 3.2, SPARK-33913 upgraded to >>>>>>>>>>> Kafka 2.7 >>>>>>>>>>> with Scala 2.12.13, but it was reverted later due to Scala 2.12.13 >>>>>>>>>>> issue. >>>>>>>>>>> Since KAFKA-12357 fixed the Scala requirement two days ago, Spark >>>>>>>>>>> 3.2 will >>>>>>>>>>> go with Kafka Client 2.8 hopefully. >>>>>>>>>>> >>>>>>>>>>> # Some Features >>>>>>>>>>> >>>>>>>>>>> - Data Source v2: Spark 3.2 will deliver much richer DSv2 with >>>>>>>>>>> Apache Iceberg integration. Especially, we hope the on-going >>>>>>>>>>> function >>>>>>>>>>> catalog SPIP and up-coming storage partitioned join SPIP can be >>>>>>>>>>> delivered >>>>>>>>>>> as a part of Spark 3.2 and become an additional foundation. >>>>>>>>>>> >>>>>>>>>>> - Columnar Encryption: As of today, Apache Spark master branch >>>>>>>>>>> supports columnar encryption via Apache ORC 1.6 and it's documented >>>>>>>>>>> via >>>>>>>>>>> SPARK-34036. Also, upcoming Apache Parquet 1.12 has a similar >>>>>>>>>>> capability. >>>>>>>>>>> Hopefully, Apache Spark 3.2 is going to be the first release to >>>>>>>>>>> have this >>>>>>>>>>> feature officially. Any feedback is welcome. >>>>>>>>>>> >>>>>>>>>>> - Improved ZStandard Support: Spark 3.2 will bring more benefits >>>>>>>>>>> for ZStandard users: 1) SPARK-34340 added native ZSTD JNI buffer >>>>>>>>>>> pool >>>>>>>>>>> support for all IO operations, 2) SPARK-33978 makes ORC datasource >>>>>>>>>>> support >>>>>>>>>>> ZSTD compression, 3) SPARK-34503 sets ZSTD as the default codec for >>>>>>>>>>> event >>>>>>>>>>> log compression, 4) SPARK-34479 aims to support ZSTD at Avro data >>>>>>>>>>> source. >>>>>>>>>>> Also, the upcoming Parquet 1.12 supports ZSTD (and supports JNI >>>>>>>>>>> buffer >>>>>>>>>>> pool), too. I'm expecting more benefits. >>>>>>>>>>> >>>>>>>>>>> - Structure Streaming with RocksDB backend: According to the >>>>>>>>>>> latest update, it looks active enough for merging to master branch >>>>>>>>>>> in Spark >>>>>>>>>>> 3.2. >>>>>>>>>>> >>>>>>>>>>> Please share your thoughts and let's build better Apache Spark >>>>>>>>>>> 3.2 together. >>>>>>>>>>> >>>>>>>>>>> Bests, >>>>>>>>>>> Dongjoon. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> John Zhuge >>>>>>>>>> >>>>>>>>>