Just to share the current status, most of the known issues were resolved. Let me know if there are some more. One thing left is a performance regression in TPCDS being investigated. Once this is identified (and fixed if it should be), I will cut another RC right away. I roughly expect to cut another RC next Monday.
Thanks guys. 2021년 1월 27일 (수) 오전 5:26, Terry Kim <yumin...@gmail.com>님이 작성: > Hi, > > Please check if the following regression should be included: > https://github.com/apache/spark/pull/31352 > > Thanks, > Terry > > On Tue, Jan 26, 2021 at 7:54 AM Holden Karau <hol...@pigscanfly.ca> wrote: > >> If were ok waiting for it, I’d like to get >> https://github.com/apache/spark/pull/31298 in as well (it’s not a >> regression but it is a bug fix). >> >> On Tue, Jan 26, 2021 at 6:38 AM Hyukjin Kwon <gurwls...@gmail.com> wrote: >> >>> It looks like a cool one but it's a pretty big one and affects the plans >>> considerably ... maybe it's best to avoid adding it into 3.1.1 in >>> particular during the RC period if this isn't a clear regression that >>> affects many users. >>> >>> 2021년 1월 26일 (화) 오후 11:23, Peter Toth <peter.t...@gmail.com>님이 작성: >>> >>>> Hey, >>>> >>>> Sorry for chiming in a bit late, but I would like to suggest my PR ( >>>> https://github.com/apache/spark/pull/28885) for review and inclusion >>>> into 3.1.1. >>>> >>>> Currently, invalid reuse reference nodes appear in many queries, >>>> causing performance issues and incorrect explain plans. Now that >>>> https://github.com/apache/spark/pull/31243 got merged these invalid >>>> references can be easily found in many of our golden files on master: >>>> https://github.com/apache/spark/pull/28885#issuecomment-767530441. >>>> But the issue isn't master (3.2) specific, actually it has been there >>>> since 3.0 when Dynamic Partition Pruning was added. >>>> So it is not a regression from 3.0 to 3.1.1, but in some cases (like >>>> TPCDS q23b) it is causing performance regression from 2.4 to 3.x. >>>> >>>> Thanks, >>>> Peter >>>> >>>> On Tue, Jan 26, 2021 at 6:30 AM Hyukjin Kwon <gurwls...@gmail.com> >>>> wrote: >>>> >>>>> Guys, I plan to make an RC as soon as we have no visible issues. I >>>>> have merged a few correctness issues. There look: >>>>> - https://github.com/apache/spark/pull/31319 waiting for a review (I >>>>> will do it too soon). >>>>> - https://github.com/apache/spark/pull/31336 >>>>> - I know Max's investigating the perf regression one which hopefully >>>>> will be fixed soon. >>>>> >>>>> Are there any more blockers or correctness issues? Please ping me or >>>>> say it out here. >>>>> I would like to avoid making an RC when there are clearly some issues >>>>> to be fixed. >>>>> If you're investigating something suspicious, that's fine too. It's >>>>> better to make sure we're safe instead of rushing an RC without finishing >>>>> the investigation. >>>>> >>>>> Thanks all. >>>>> >>>>> >>>>> 2021년 1월 22일 (금) 오후 6:19, Hyukjin Kwon <gurwls...@gmail.com>님이 작성: >>>>> >>>>>> Sure, thanks guys. I'll start another RC after the fixes. Looks like >>>>>> we're almost there. >>>>>> >>>>>> On Fri, 22 Jan 2021, 17:47 Wenchen Fan, <cloud0...@gmail.com> wrote: >>>>>> >>>>>>> BTW, there is a correctness bug being fixed at >>>>>>> https://github.com/apache/spark/pull/30788 . It's not a regression, >>>>>>> but the fix is very simple and it would be better to start the next RC >>>>>>> after merging that fix. >>>>>>> >>>>>>> On Fri, Jan 22, 2021 at 3:54 PM Maxim Gekk < >>>>>>> maxim.g...@databricks.com> wrote: >>>>>>> >>>>>>>> Also I am investigating a performance regression in some TPC-DS >>>>>>>> queries (q88 for instance) that is caused by a recent commit in 3.1, >>>>>>>> highly >>>>>>>> likely in the period from 19th November, 2020 to 18th December, 2020. >>>>>>>> >>>>>>>> Maxim Gekk >>>>>>>> >>>>>>>> Software Engineer >>>>>>>> >>>>>>>> Databricks, Inc. >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jan 22, 2021 at 10:45 AM Wenchen Fan <cloud0...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> -1 as I just found a regression in 3.1. A self-join query works >>>>>>>>> well in 3.0 but fails in 3.1. It's being fixed at >>>>>>>>> https://github.com/apache/spark/pull/31287 >>>>>>>>> >>>>>>>>> On Fri, Jan 22, 2021 at 4:34 AM Tom Graves >>>>>>>>> <tgraves...@yahoo.com.invalid> wrote: >>>>>>>>> >>>>>>>>>> +1 >>>>>>>>>> >>>>>>>>>> built from tarball, verified sha and regular CI and tests all >>>>>>>>>> pass. >>>>>>>>>> >>>>>>>>>> Tom >>>>>>>>>> >>>>>>>>>> On Monday, January 18, 2021, 06:06:42 AM CST, Hyukjin Kwon < >>>>>>>>>> gurwls...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Please vote on releasing the following candidate as Apache Spark >>>>>>>>>> version 3.1.1. >>>>>>>>>> >>>>>>>>>> The vote is open until January 22nd 4PM PST and passes if a >>>>>>>>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes. >>>>>>>>>> >>>>>>>>>> [ ] +1 Release this package as Apache Spark 3.1.0 >>>>>>>>>> [ ] -1 Do not release this package because ... >>>>>>>>>> >>>>>>>>>> To learn more about Apache Spark, please see >>>>>>>>>> http://spark.apache.org/ >>>>>>>>>> >>>>>>>>>> The tag to be voted on is v3.1.1-rc1 (commit >>>>>>>>>> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d): >>>>>>>>>> https://github.com/apache/spark/tree/v3.1.1-rc1 >>>>>>>>>> >>>>>>>>>> The release files, including signatures, digests, etc. can be >>>>>>>>>> found at: >>>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/ >>>>>>>>>> >>>>>>>>>> Signatures used for Spark RCs can be found in this file: >>>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS >>>>>>>>>> >>>>>>>>>> The staging repository for this release can be found at: >>>>>>>>>> >>>>>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1364 >>>>>>>>>> >>>>>>>>>> The documentation corresponding to this release can be found at: >>>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/ >>>>>>>>>> >>>>>>>>>> The list of bug fixes going into 3.1.1 can be found at the >>>>>>>>>> following URL: >>>>>>>>>> https://s.apache.org/41kf2 >>>>>>>>>> >>>>>>>>>> This release is using the release script of the tag v3.1.1-rc1. >>>>>>>>>> >>>>>>>>>> FAQ >>>>>>>>>> >>>>>>>>>> =================== >>>>>>>>>> What happened to 3.1.0? >>>>>>>>>> =================== >>>>>>>>>> >>>>>>>>>> There was a technical issue during Apache Spark 3.1.0 >>>>>>>>>> preparation, and it was discussed and decided to skip 3.1.0. >>>>>>>>>> Please see >>>>>>>>>> https://spark.apache.org/news/next-official-release-spark-3.1.1.html >>>>>>>>>> for more details. >>>>>>>>>> >>>>>>>>>> ========================= >>>>>>>>>> How can I help test this release? >>>>>>>>>> ========================= >>>>>>>>>> >>>>>>>>>> If you are a Spark user, you can help us test this release by >>>>>>>>>> taking >>>>>>>>>> an existing Spark workload and running on this release candidate, >>>>>>>>>> then >>>>>>>>>> reporting any regressions. >>>>>>>>>> >>>>>>>>>> If you're working in PySpark you can set up a virtual env and >>>>>>>>>> install >>>>>>>>>> the current RC via "pip install >>>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz >>>>>>>>>> " >>>>>>>>>> and see if anything important breaks. >>>>>>>>>> In the Java/Scala, you can add the staging repository to your >>>>>>>>>> projects resolvers and test >>>>>>>>>> with the RC (make sure to clean up the artifact cache >>>>>>>>>> before/after so >>>>>>>>>> you don't end up building with an out of date RC going forward). >>>>>>>>>> >>>>>>>>>> =========================================== >>>>>>>>>> What should happen to JIRA tickets still targeting 3.1.1? >>>>>>>>>> =========================================== >>>>>>>>>> >>>>>>>>>> The current list of open tickets targeted at 3.1.1 can be found >>>>>>>>>> at: >>>>>>>>>> https://issues.apache.org/jira/projects/SPARK and search for >>>>>>>>>> "Target Version/s" = 3.1.1 >>>>>>>>>> >>>>>>>>>> Committers should look at those and triage. Extremely important >>>>>>>>>> bug >>>>>>>>>> fixes, documentation, and API tweaks that impact compatibility >>>>>>>>>> should >>>>>>>>>> be worked on immediately. Everything else please retarget to an >>>>>>>>>> appropriate release. >>>>>>>>>> >>>>>>>>>> ================== >>>>>>>>>> But my bug isn't fixed? >>>>>>>>>> ================== >>>>>>>>>> >>>>>>>>>> In order to make timely releases, we will typically not hold the >>>>>>>>>> release unless the bug in question is a regression from the >>>>>>>>>> previous >>>>>>>>>> release. That being said, if there is something which is a >>>>>>>>>> regression >>>>>>>>>> that has not been correctly targeted please ping me or a >>>>>>>>>> committer to >>>>>>>>>> help target the issue. >>>>>>>>>> >>>>>>>>>> -- >> Twitter: https://twitter.com/holdenkarau >> Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >> >