Hey, Sorry for chiming in a bit late, but I would like to suggest my PR ( https://github.com/apache/spark/pull/28885) for review and inclusion into 3.1.1.
Currently, invalid reuse reference nodes appear in many queries, causing performance issues and incorrect explain plans. Now that https://github.com/apache/spark/pull/31243 got merged these invalid references can be easily found in many of our golden files on master: https://github.com/apache/spark/pull/28885#issuecomment-767530441. But the issue isn't master (3.2) specific, actually it has been there since 3.0 when Dynamic Partition Pruning was added. So it is not a regression from 3.0 to 3.1.1, but in some cases (like TPCDS q23b) it is causing performance regression from 2.4 to 3.x. Thanks, Peter On Tue, Jan 26, 2021 at 6:30 AM Hyukjin Kwon <gurwls...@gmail.com> wrote: > Guys, I plan to make an RC as soon as we have no visible issues. I have > merged a few correctness issues. There look: > - https://github.com/apache/spark/pull/31319 waiting for a review (I will > do it too soon). > - https://github.com/apache/spark/pull/31336 > - I know Max's investigating the perf regression one which hopefully will > be fixed soon. > > Are there any more blockers or correctness issues? Please ping me or say > it out here. > I would like to avoid making an RC when there are clearly some issues to > be fixed. > If you're investigating something suspicious, that's fine too. It's better > to make sure we're safe instead of rushing an RC without finishing the > investigation. > > Thanks all. > > > 2021년 1월 22일 (금) 오후 6:19, Hyukjin Kwon <gurwls...@gmail.com>님이 작성: > >> Sure, thanks guys. I'll start another RC after the fixes. Looks like >> we're almost there. >> >> On Fri, 22 Jan 2021, 17:47 Wenchen Fan, <cloud0...@gmail.com> wrote: >> >>> BTW, there is a correctness bug being fixed at >>> https://github.com/apache/spark/pull/30788 . It's not a regression, but >>> the fix is very simple and it would be better to start the next RC after >>> merging that fix. >>> >>> On Fri, Jan 22, 2021 at 3:54 PM Maxim Gekk <maxim.g...@databricks.com> >>> wrote: >>> >>>> Also I am investigating a performance regression in some TPC-DS queries >>>> (q88 for instance) that is caused by a recent commit in 3.1, highly likely >>>> in the period from 19th November, 2020 to 18th December, 2020. >>>> >>>> Maxim Gekk >>>> >>>> Software Engineer >>>> >>>> Databricks, Inc. >>>> >>>> >>>> On Fri, Jan 22, 2021 at 10:45 AM Wenchen Fan <cloud0...@gmail.com> >>>> wrote: >>>> >>>>> -1 as I just found a regression in 3.1. A self-join query works well >>>>> in 3.0 but fails in 3.1. It's being fixed at >>>>> https://github.com/apache/spark/pull/31287 >>>>> >>>>> On Fri, Jan 22, 2021 at 4:34 AM Tom Graves >>>>> <tgraves...@yahoo.com.invalid> wrote: >>>>> >>>>>> +1 >>>>>> >>>>>> built from tarball, verified sha and regular CI and tests all pass. >>>>>> >>>>>> Tom >>>>>> >>>>>> On Monday, January 18, 2021, 06:06:42 AM CST, Hyukjin Kwon < >>>>>> gurwls...@gmail.com> wrote: >>>>>> >>>>>> >>>>>> Please vote on releasing the following candidate as Apache Spark >>>>>> version 3.1.1. >>>>>> >>>>>> The vote is open until January 22nd 4PM PST and passes if a majority >>>>>> +1 PMC votes are cast, with a minimum of 3 +1 votes. >>>>>> >>>>>> [ ] +1 Release this package as Apache Spark 3.1.0 >>>>>> [ ] -1 Do not release this package because ... >>>>>> >>>>>> To learn more about Apache Spark, please see http://spark.apache.org/ >>>>>> >>>>>> The tag to be voted on is v3.1.1-rc1 (commit >>>>>> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d): >>>>>> https://github.com/apache/spark/tree/v3.1.1-rc1 >>>>>> >>>>>> The release files, including signatures, digests, etc. can be found >>>>>> at: >>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/ >>>>>> >>>>>> Signatures used for Spark RCs can be found in this file: >>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS >>>>>> >>>>>> The staging repository for this release can be found at: >>>>>> https://repository.apache.org/content/repositories/orgapachespark-1364 >>>>>> >>>>>> The documentation corresponding to this release can be found at: >>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/ >>>>>> >>>>>> The list of bug fixes going into 3.1.1 can be found at the following >>>>>> URL: >>>>>> https://s.apache.org/41kf2 >>>>>> >>>>>> This release is using the release script of the tag v3.1.1-rc1. >>>>>> >>>>>> FAQ >>>>>> >>>>>> =================== >>>>>> What happened to 3.1.0? >>>>>> =================== >>>>>> >>>>>> There was a technical issue during Apache Spark 3.1.0 preparation, >>>>>> and it was discussed and decided to skip 3.1.0. >>>>>> Please see >>>>>> https://spark.apache.org/news/next-official-release-spark-3.1.1.html >>>>>> for more details. >>>>>> >>>>>> ========================= >>>>>> How can I help test this release? >>>>>> ========================= >>>>>> >>>>>> If you are a Spark user, you can help us test this release by taking >>>>>> an existing Spark workload and running on this release candidate, then >>>>>> reporting any regressions. >>>>>> >>>>>> If you're working in PySpark you can set up a virtual env and install >>>>>> the current RC via "pip install >>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz >>>>>> " >>>>>> and see if anything important breaks. >>>>>> In the Java/Scala, you can add the staging repository to your >>>>>> projects resolvers and test >>>>>> with the RC (make sure to clean up the artifact cache before/after so >>>>>> you don't end up building with an out of date RC going forward). >>>>>> >>>>>> =========================================== >>>>>> What should happen to JIRA tickets still targeting 3.1.1? >>>>>> =========================================== >>>>>> >>>>>> The current list of open tickets targeted at 3.1.1 can be found at: >>>>>> https://issues.apache.org/jira/projects/SPARK and search for "Target >>>>>> Version/s" = 3.1.1 >>>>>> >>>>>> Committers should look at those and triage. Extremely important bug >>>>>> fixes, documentation, and API tweaks that impact compatibility should >>>>>> be worked on immediately. Everything else please retarget to an >>>>>> appropriate release. >>>>>> >>>>>> ================== >>>>>> But my bug isn't fixed? >>>>>> ================== >>>>>> >>>>>> In order to make timely releases, we will typically not hold the >>>>>> release unless the bug in question is a regression from the previous >>>>>> release. That being said, if there is something which is a regression >>>>>> that has not been correctly targeted please ping me or a committer to >>>>>> help target the issue. >>>>>> >>>>>>