> > Apache Spark 3.1.0 should be compared with Apache Spark 2.1.0.
I think we made a change in release cadence since Spark 2.3. See the commit: https://github.com/apache/spark-website/commit/88990968962e5cc47db8bc2c11a50742d2438daa Thus, Spark 3.1 might just follow the release cadence of Spark 2.3/2.4, if we do not want to change the release cadence? How about moving the code freeze of Spark 3.1 to *Early Dec 2020* and the RC1 date to* Early Jan 2021*? Thanks, Xiao Dongjoon Hyun <dongjoon.h...@gmail.com> 于2020年10月4日周日 下午12:44写道: > For Xiao's comment, I want to point out that Apache Spark 3.1.0 is > different from 2.3 or 2.4. > > Apache Spark 3.1.0 should be compared with Apache Spark 2.1.0. > > - Apache Spark 2.0.0 was released on July 26, 2016. > - Apache Spark 2.1.0 was released on December 28, 2016. > > Bests, > Dongjoon. > > > On Sun, Oct 4, 2020 at 10:53 AM Dongjoon Hyun <dongjoon.h...@gmail.com> > wrote: > >> Thank you all. >> >> BTW, Xiao and Mridul, I'm wondering what date you have in your mind >> specifically. >> >> Usually, `Christmas and New Year season` doesn't give us much additional >> time. >> >> If you think so, could you make a PR for Apache Spark website according >> to your expectation? >> >> https://spark.apache.org/versioning-policy.html >> >> Bests, >> Dongjoon. >> >> >> On Sun, Oct 4, 2020 at 7:18 AM Mridul Muralidharan <mri...@gmail.com> >> wrote: >> >>> >>> +1 on pushing the branch cut for increased dev time to match previous >>> releases. >>> >>> Regards, >>> Mridul >>> >>> On Sat, Oct 3, 2020 at 10:22 PM Xiao Li <gatorsm...@gmail.com> wrote: >>> >>>> Thank you for your updates. >>>> >>>> Spark 3.0 got released on Jun 18, 2020. If Nov 1st is the target date >>>> of the 3.1 branch cut, the feature development time window is less than 5 >>>> months. This is shorter than what we did in Spark 2.3 and 2.4 releases. >>>> >>>> Below are three highly desirable feature work I am watching. Hopefully, >>>> we can finish them before the branch cut. >>>> >>>> - Support push-based shuffle to improve shuffle efficiency: >>>> https://issues.apache.org/jira/browse/SPARK-30602 >>>> - Unify create table syntax: >>>> https://issues.apache.org/jira/browse/SPARK-31257 >>>> - Bloom filter join: >>>> https://issues.apache.org/jira/browse/SPARK-32268 >>>> >>>> Thanks, >>>> >>>> Xiao >>>> >>>> >>>> Hyukjin Kwon <gurwls...@gmail.com> 于2020年10月3日周六 下午5:41写道: >>>> >>>>> Nice summary. Thanks Dongjoon. One minor correction -> I believe we >>>>> dropped R 3.5 and below at branch 2.4 as well. >>>>> >>>>> On Sun, 4 Oct 2020, 09:17 Dongjoon Hyun, <dongjoon.h...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi, All. >>>>>> >>>>>> As of today, master branch (Apache Spark 3.1.0) resolved >>>>>> 852+ JIRA issues and 606+ issues are 3.1.0-only patches. >>>>>> According to the 3.1.0 release window, branch-3.1 will be >>>>>> created on November 1st and enters QA period. >>>>>> >>>>>> Here are some notable updates I've been monitoring. >>>>>> >>>>>> *Language* >>>>>> 01. SPARK-25075 Support Scala 2.13 >>>>>> - Since SPARK-32926, Scala 2.13 build test has >>>>>> become a part of GitHub Action jobs. >>>>>> - After SPARK-33044, Scala 2.13 test will be >>>>>> a part of Jenkins jobs. >>>>>> 02. SPARK-29909 Drop Python 2 and Python 3.4 and 3.5 >>>>>> 03. SPARK-32082 Project Zen: Improving Python usability >>>>>> - 7 of 16 issues are resolved. >>>>>> 04. SPARK-32073 Drop R < 3.5 support >>>>>> - This is done for Spark 3.0.1 and 3.1.0. >>>>>> >>>>>> *Dependency* >>>>>> 05. SPARK-32058 Use Apache Hadoop 3.2.0 dependency >>>>>> - This changes the default dist. for better cloud support >>>>>> 06. SPARK-32981 Remove hive-1.2 distribution >>>>>> 07. SPARK-20202 Remove references to org.spark-project.hive >>>>>> - This will remove Hive 1.2.1 from source code >>>>>> 08. SPARK-29250 Upgrade to Hadoop 3.2.1 (WIP) >>>>>> >>>>>> *Core* >>>>>> 09. SPARK-27495 Support Stage level resource conf and scheduling >>>>>> - 11 of 15 issues are resolved >>>>>> 10. SPARK-25299 Use remote storage for persisting shuffle data >>>>>> - 8 of 14 issues are resolved >>>>>> >>>>>> *Resource Manager* >>>>>> 11. SPARK-33005 Kubernetes GA preparation >>>>>> - It is on the way and we are waiting for more feedback. >>>>>> >>>>>> *SQL* >>>>>> 12. SPARK-30648/SPARK-32346 Support filters pushdown >>>>>> to JSON/Avro >>>>>> 13. SPARK-32948/SPARK-32958 Add Json expression optimizer >>>>>> 14. SPARK-12312 Support JDBC Kerberos w/ keytab >>>>>> - 11 of 17 issues are resolved >>>>>> 15. SPARK-27589 DSv2 was mostly completed in 3.0 >>>>>> and added more features in 3.1 but still we missed >>>>>> - All built-in DataSource v2 write paths are disabled >>>>>> and v1 write is used instead. >>>>>> - Support partition pruning with subqueries >>>>>> - Support bucketing >>>>>> >>>>>> We still have one month before the feature freeze >>>>>> and starting QA. If you are working for 3.1, >>>>>> please consider the timeline and share your schedule >>>>>> with the Apache Spark community. For the other stuff, >>>>>> we can put it into 3.2 release scheduled in June 2021. >>>>>> >>>>>> Last not but least, I want to emphasize (7) once again. >>>>>> We need to remove the forked unofficial Hive eventually. >>>>>> Please let us know your reasons if you need to build >>>>>> from Apache Spark 3.1 source code for Hive 1.2. >>>>>> >>>>>> https://github.com/apache/spark/pull/29936 >>>>>> >>>>>> As I wrote in the above PR description, for old releases, >>>>>> Apache Spark 2.4(LTS) and 3.0 (~2021.12) will provide >>>>>> Hive 1.2-based distribution. >>>>>> >>>>>> Bests, >>>>>> Dongjoon. >>>>>> >>>>>