Hey,

Sorry for chiming in a bit late, but I would like to suggest my PR (
https://github.com/apache/spark/pull/28885) for review and inclusion into
3.1.1.

Currently, invalid reuse reference nodes appear in many queries, causing
performance issues and incorrect explain plans. Now that
https://github.com/apache/spark/pull/31243 got merged these invalid
references can be easily found in many of our golden files on master:
https://github.com/apache/spark/pull/28885#issuecomment-767530441.
But the issue isn't master (3.2) specific, actually it has been there since
3.0 when Dynamic Partition Pruning was added.
So it is not a regression from 3.0 to 3.1.1, but in some cases (like TPCDS
q23b) it is causing performance regression from 2.4 to 3.x.

Thanks,
Peter

On Tue, Jan 26, 2021 at 6:30 AM Hyukjin Kwon <gurwls...@gmail.com> wrote:

> Guys, I plan to make an RC as soon as we have no visible issues. I have
> merged a few correctness issues. There look:
> - https://github.com/apache/spark/pull/31319 waiting for a review (I will
> do it too soon).
> - https://github.com/apache/spark/pull/31336
> - I know Max's investigating the perf regression one which hopefully will
> be fixed soon.
>
> Are there any more blockers or correctness issues? Please ping me or say
> it out here.
> I would like to avoid making an RC when there are clearly some issues to
> be fixed.
> If you're investigating something suspicious, that's fine too. It's better
> to make sure we're safe instead of rushing an RC without finishing the
> investigation.
>
> Thanks all.
>
>
> 2021년 1월 22일 (금) 오후 6:19, Hyukjin Kwon <gurwls...@gmail.com>님이 작성:
>
>> Sure, thanks guys. I'll start another RC after the fixes. Looks like
>> we're almost there.
>>
>> On Fri, 22 Jan 2021, 17:47 Wenchen Fan, <cloud0...@gmail.com> wrote:
>>
>>> BTW, there is a correctness bug being fixed at
>>> https://github.com/apache/spark/pull/30788 . It's not a regression, but
>>> the fix is very simple and it would be better to start the next RC after
>>> merging that fix.
>>>
>>> On Fri, Jan 22, 2021 at 3:54 PM Maxim Gekk <maxim.g...@databricks.com>
>>> wrote:
>>>
>>>> Also I am investigating a performance regression in some TPC-DS queries
>>>> (q88 for instance) that is caused by a recent commit in 3.1, highly likely
>>>> in the period from 19th November, 2020 to 18th December, 2020.
>>>>
>>>> Maxim Gekk
>>>>
>>>> Software Engineer
>>>>
>>>> Databricks, Inc.
>>>>
>>>>
>>>> On Fri, Jan 22, 2021 at 10:45 AM Wenchen Fan <cloud0...@gmail.com>
>>>> wrote:
>>>>
>>>>> -1 as I just found a regression in 3.1. A self-join query works well
>>>>> in 3.0 but fails in 3.1. It's being fixed at
>>>>> https://github.com/apache/spark/pull/31287
>>>>>
>>>>> On Fri, Jan 22, 2021 at 4:34 AM Tom Graves
>>>>> <tgraves...@yahoo.com.invalid> wrote:
>>>>>
>>>>>> +1
>>>>>>
>>>>>> built from tarball, verified sha and regular CI and tests all pass.
>>>>>>
>>>>>> Tom
>>>>>>
>>>>>> On Monday, January 18, 2021, 06:06:42 AM CST, Hyukjin Kwon <
>>>>>> gurwls...@gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>>> version 3.1.1.
>>>>>>
>>>>>> The vote is open until January 22nd 4PM PST and passes if a majority
>>>>>> +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>>>>
>>>>>> [ ] +1 Release this package as Apache Spark 3.1.0
>>>>>> [ ] -1 Do not release this package because ...
>>>>>>
>>>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>>>
>>>>>> The tag to be voted on is v3.1.1-rc1 (commit
>>>>>> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d):
>>>>>> https://github.com/apache/spark/tree/v3.1.1-rc1
>>>>>>
>>>>>> The release files, including signatures, digests, etc. can be found
>>>>>> at:
>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/
>>>>>>
>>>>>> Signatures used for Spark RCs can be found in this file:
>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>>>
>>>>>> The staging repository for this release can be found at:
>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1364
>>>>>>
>>>>>> The documentation corresponding to this release can be found at:
>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/
>>>>>>
>>>>>> The list of bug fixes going into 3.1.1 can be found at the following
>>>>>> URL:
>>>>>> https://s.apache.org/41kf2
>>>>>>
>>>>>> This release is using the release script of the tag v3.1.1-rc1.
>>>>>>
>>>>>> FAQ
>>>>>>
>>>>>> ===================
>>>>>> What happened to 3.1.0?
>>>>>> ===================
>>>>>>
>>>>>> There was a technical issue during Apache Spark 3.1.0 preparation,
>>>>>> and it was discussed and decided to skip 3.1.0.
>>>>>> Please see
>>>>>> https://spark.apache.org/news/next-official-release-spark-3.1.1.html
>>>>>> for more details.
>>>>>>
>>>>>> =========================
>>>>>> How can I help test this release?
>>>>>> =========================
>>>>>>
>>>>>> If you are a Spark user, you can help us test this release by taking
>>>>>> an existing Spark workload and running on this release candidate, then
>>>>>> reporting any regressions.
>>>>>>
>>>>>> If you're working in PySpark you can set up a virtual env and install
>>>>>> the current RC via "pip install
>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz
>>>>>> "
>>>>>> and see if anything important breaks.
>>>>>> In the Java/Scala, you can add the staging repository to your
>>>>>> projects resolvers and test
>>>>>> with the RC (make sure to clean up the artifact cache before/after so
>>>>>> you don't end up building with an out of date RC going forward).
>>>>>>
>>>>>> ===========================================
>>>>>> What should happen to JIRA tickets still targeting 3.1.1?
>>>>>> ===========================================
>>>>>>
>>>>>> The current list of open tickets targeted at 3.1.1 can be found at:
>>>>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>>>>> Version/s" = 3.1.1
>>>>>>
>>>>>> Committers should look at those and triage. Extremely important bug
>>>>>> fixes, documentation, and API tweaks that impact compatibility should
>>>>>> be worked on immediately. Everything else please retarget to an
>>>>>> appropriate release.
>>>>>>
>>>>>> ==================
>>>>>> But my bug isn't fixed?
>>>>>> ==================
>>>>>>
>>>>>> In order to make timely releases, we will typically not hold the
>>>>>> release unless the bug in question is a regression from the previous
>>>>>> release. That being said, if there is something which is a regression
>>>>>> that has not been correctly targeted please ping me or a committer to
>>>>>> help target the issue.
>>>>>>
>>>>>>

Reply via email to