It looks like a cool one but it's a pretty big one and affects the plans considerably ... maybe it's best to avoid adding it into 3.1.1 in particular during the RC period if this isn't a clear regression that affects many users.
2021년 1월 26일 (화) 오후 11:23, Peter Toth <peter.t...@gmail.com>님이 작성: > Hey, > > Sorry for chiming in a bit late, but I would like to suggest my PR ( > https://github.com/apache/spark/pull/28885) for review and inclusion into > 3.1.1. > > Currently, invalid reuse reference nodes appear in many queries, causing > performance issues and incorrect explain plans. Now that > https://github.com/apache/spark/pull/31243 got merged these invalid > references can be easily found in many of our golden files on master: > https://github.com/apache/spark/pull/28885#issuecomment-767530441. > But the issue isn't master (3.2) specific, actually it has been there > since 3.0 when Dynamic Partition Pruning was added. > So it is not a regression from 3.0 to 3.1.1, but in some cases (like TPCDS > q23b) it is causing performance regression from 2.4 to 3.x. > > Thanks, > Peter > > On Tue, Jan 26, 2021 at 6:30 AM Hyukjin Kwon <gurwls...@gmail.com> wrote: > >> Guys, I plan to make an RC as soon as we have no visible issues. I have >> merged a few correctness issues. There look: >> - https://github.com/apache/spark/pull/31319 waiting for a review (I >> will do it too soon). >> - https://github.com/apache/spark/pull/31336 >> - I know Max's investigating the perf regression one which hopefully will >> be fixed soon. >> >> Are there any more blockers or correctness issues? Please ping me or say >> it out here. >> I would like to avoid making an RC when there are clearly some issues to >> be fixed. >> If you're investigating something suspicious, that's fine too. It's >> better to make sure we're safe instead of rushing an RC without finishing >> the investigation. >> >> Thanks all. >> >> >> 2021년 1월 22일 (금) 오후 6:19, Hyukjin Kwon <gurwls...@gmail.com>님이 작성: >> >>> Sure, thanks guys. I'll start another RC after the fixes. Looks like >>> we're almost there. >>> >>> On Fri, 22 Jan 2021, 17:47 Wenchen Fan, <cloud0...@gmail.com> wrote: >>> >>>> BTW, there is a correctness bug being fixed at >>>> https://github.com/apache/spark/pull/30788 . It's not a regression, >>>> but the fix is very simple and it would be better to start the next RC >>>> after merging that fix. >>>> >>>> On Fri, Jan 22, 2021 at 3:54 PM Maxim Gekk <maxim.g...@databricks.com> >>>> wrote: >>>> >>>>> Also I am investigating a performance regression in some TPC-DS >>>>> queries (q88 for instance) that is caused by a recent commit in 3.1, >>>>> highly >>>>> likely in the period from 19th November, 2020 to 18th December, 2020. >>>>> >>>>> Maxim Gekk >>>>> >>>>> Software Engineer >>>>> >>>>> Databricks, Inc. >>>>> >>>>> >>>>> On Fri, Jan 22, 2021 at 10:45 AM Wenchen Fan <cloud0...@gmail.com> >>>>> wrote: >>>>> >>>>>> -1 as I just found a regression in 3.1. A self-join query works well >>>>>> in 3.0 but fails in 3.1. It's being fixed at >>>>>> https://github.com/apache/spark/pull/31287 >>>>>> >>>>>> On Fri, Jan 22, 2021 at 4:34 AM Tom Graves >>>>>> <tgraves...@yahoo.com.invalid> wrote: >>>>>> >>>>>>> +1 >>>>>>> >>>>>>> built from tarball, verified sha and regular CI and tests all pass. >>>>>>> >>>>>>> Tom >>>>>>> >>>>>>> On Monday, January 18, 2021, 06:06:42 AM CST, Hyukjin Kwon < >>>>>>> gurwls...@gmail.com> wrote: >>>>>>> >>>>>>> >>>>>>> Please vote on releasing the following candidate as Apache Spark >>>>>>> version 3.1.1. >>>>>>> >>>>>>> The vote is open until January 22nd 4PM PST and passes if a majority >>>>>>> +1 PMC votes are cast, with a minimum of 3 +1 votes. >>>>>>> >>>>>>> [ ] +1 Release this package as Apache Spark 3.1.0 >>>>>>> [ ] -1 Do not release this package because ... >>>>>>> >>>>>>> To learn more about Apache Spark, please see >>>>>>> http://spark.apache.org/ >>>>>>> >>>>>>> The tag to be voted on is v3.1.1-rc1 (commit >>>>>>> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d): >>>>>>> https://github.com/apache/spark/tree/v3.1.1-rc1 >>>>>>> >>>>>>> The release files, including signatures, digests, etc. can be found >>>>>>> at: >>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/ >>>>>>> >>>>>>> Signatures used for Spark RCs can be found in this file: >>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS >>>>>>> >>>>>>> The staging repository for this release can be found at: >>>>>>> >>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1364 >>>>>>> >>>>>>> The documentation corresponding to this release can be found at: >>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/ >>>>>>> >>>>>>> The list of bug fixes going into 3.1.1 can be found at the following >>>>>>> URL: >>>>>>> https://s.apache.org/41kf2 >>>>>>> >>>>>>> This release is using the release script of the tag v3.1.1-rc1. >>>>>>> >>>>>>> FAQ >>>>>>> >>>>>>> =================== >>>>>>> What happened to 3.1.0? >>>>>>> =================== >>>>>>> >>>>>>> There was a technical issue during Apache Spark 3.1.0 preparation, >>>>>>> and it was discussed and decided to skip 3.1.0. >>>>>>> Please see >>>>>>> https://spark.apache.org/news/next-official-release-spark-3.1.1.html >>>>>>> for more details. >>>>>>> >>>>>>> ========================= >>>>>>> How can I help test this release? >>>>>>> ========================= >>>>>>> >>>>>>> If you are a Spark user, you can help us test this release by taking >>>>>>> an existing Spark workload and running on this release candidate, >>>>>>> then >>>>>>> reporting any regressions. >>>>>>> >>>>>>> If you're working in PySpark you can set up a virtual env and install >>>>>>> the current RC via "pip install >>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz >>>>>>> " >>>>>>> and see if anything important breaks. >>>>>>> In the Java/Scala, you can add the staging repository to your >>>>>>> projects resolvers and test >>>>>>> with the RC (make sure to clean up the artifact cache before/after so >>>>>>> you don't end up building with an out of date RC going forward). >>>>>>> >>>>>>> =========================================== >>>>>>> What should happen to JIRA tickets still targeting 3.1.1? >>>>>>> =========================================== >>>>>>> >>>>>>> The current list of open tickets targeted at 3.1.1 can be found at: >>>>>>> https://issues.apache.org/jira/projects/SPARK and search for >>>>>>> "Target Version/s" = 3.1.1 >>>>>>> >>>>>>> Committers should look at those and triage. Extremely important bug >>>>>>> fixes, documentation, and API tweaks that impact compatibility should >>>>>>> be worked on immediately. Everything else please retarget to an >>>>>>> appropriate release. >>>>>>> >>>>>>> ================== >>>>>>> But my bug isn't fixed? >>>>>>> ================== >>>>>>> >>>>>>> In order to make timely releases, we will typically not hold the >>>>>>> release unless the bug in question is a regression from the previous >>>>>>> release. That being said, if there is something which is a regression >>>>>>> that has not been correctly targeted please ping me or a committer to >>>>>>> help target the issue. >>>>>>> >>>>>>>