Re: Bridging gap between Spark UI and Code

2021-05-25 Thread Wenchen Fan
You can see the SQL plan node name in the DAG visualization. Please refer
to https://spark.apache.org/docs/latest/web-ui.html for more details. If
you still have any confusion, please let us know and we will keep improving
the document.

On Tue, May 25, 2021 at 4:41 AM mhawes  wrote:

> @Wenchen Fan, understood that the mapping of query plan to application code
> is very hard. I was wondering if we might be able to instead just handle
> the
> mapping from the final physical plan to the stage graph. So for example
> you’d be able to tell what part of the plan generated which stages. I feel
> this would provide the most benefit without having to worry about several
> optimisation steps.
>
> The main issue as I see it is that currently, if there’s a failing stage,
> it’s almost impossible to track down the part of the plan that generated
> the
> stage. Would this be possible? If not, do you have any other suggestions
> for
> this kind of debugging?
>
> Best,
> Matt
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [Spark Core]: Adding support for size based partition coalescing

2021-05-25 Thread Wenchen Fan
Without AQE, repartition() simply creates 200 (the value of
spark.sql.shuffle.partitions) partitions AFAIK. The AQE helps you to
coalesce the partitions into a reasonable number, by size. Note that you
need to tune spark.sql.shuffle.partitions to make sure it's big enough, as
AQE can not increase the number of partitions, only coalesce.

On Tue, May 25, 2021 at 2:35 AM Tom Graves  wrote:

> so repartition() would look at some other config (
> spark.sql.adaptive.advisoryPartitionSizeInBytes) to decide the size to
> use to partition it on then?  Does it require AQE?  If so what does a
> repartition() call do if AQE is not enabled? this is essentially a new api
> so would repartitionBySize or something be less confusing to users who
> already use repartition(num_partitions).
>
> Tom
>
> On Monday, May 24, 2021, 12:30:20 PM CDT, Wenchen Fan 
> wrote:
>
>
> Ideally this should be handled by the underlying data source to produce a
> reasonably partitioned RDD as the input data. However if we already have a
> poorly partitioned RDD at hand and want to repartition it properly, I think
> an extra shuffle is required so that we can know the partition size first.
>
> That said, I think calling `.repartition()` with no args is indeed a good
> solution for this problem.
>
> On Sat, May 22, 2021 at 1:12 AM mhawes  wrote:
>
> Adding /another/ update to say that I'm currently planning on using a
> recently introduced feature whereby calling `.repartition()` with no args
> will cause the dataset to be optimised by AQE. This actually suits our
> use-case perfectly!
>
> Example:
>
> sparkSession.conf().set("spark.sql.adaptive.enabled", "true");
> Dataset dataset = sparkSession.range(1, 4, 1,
> 4).repartition();
>
> assertThat(dataset.rdd().collectPartitions().length).isEqualTo(1);
> // true
>
>
> Relevant PRs/Issues:
> [SPARK-31220][SQL] repartition obeys initialPartitionNum when
> adaptiveExecutionEnabled https://github.com/apache/spark/pull/27986
> [SPARK-32056][SQL
> ]
> Coalesce partitions for repartition by expressions when
> AQE is enabled https://github.com/apache/spark/pull/28900
> [SPARK-32056][SQL][Follow-up] Coalesce partitions for repartiotion hint and
> sql when AQE is enabled https://github.com/apache/spark/pull/28952
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Should AggregationIterator.initializeBuffer be moved down to SortBasedAggregationIterator?

2021-05-25 Thread Jacek Laskowski
Hi,

Just found out that the only purpose
of AggregationIterator.initializeBuffer is to
keep SortBasedAggregationIterator happy [1].

Shouldn't it be moved down to SortBasedAggregationIterator to make things
clear(er)?

[1] https://github.com/apache/spark/search?q=initializeBuffer

Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
"The Internals Of" Online Books 
Follow me on https://twitter.com/jaceklaskowski




[SPARK-20384][SQL] Support value class in schema of Dataset (third time's a charm)

2021-05-25 Thread Emil Ejbyfeldt

Hi dev,

I am interested getting the support value classes in schemas of Dataset 
merged and I am willing to work on it.


There are two previous PRs created for this JIRA (SPARK-20384) first 
https://github.com/apache/spark/pull/22309 and more recently 
https://github.com/apache/spark/pull/27153 (marked stale ~1year ago). It 
does not seem to me that the PR have been meet with any resistance but 
have the activity has just died out and therefore the changes have not 
been merged.


Before spending more time on this I would like to that there is any 
known problems with supporting this that has caused the previous PRs to 
not be merged?


I think the changes proposed in the later PR is still valid and a good 
approach for adding support. Should I ask to have that PR reopened or 
creating a new one since I am not the original author?


/ Emil



-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 3.1.2 (RC1)

2021-05-25 Thread Sean Owen
+1 same result as in previous tests

On Mon, May 24, 2021 at 1:14 AM Dongjoon Hyun 
wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 3.1.2.
>
> The vote is open until May 27th 1AM (PST) and passes if a majority +1 PMC
> votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.1.2
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see https://spark.apache.org/
>
> The tag to be voted on is v3.1.2-rc1 (commit
> de351e30a90dd988b133b3d00fa6218bfcaba8b8):
> https://github.com/apache/spark/tree/v3.1.2-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.2-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1384/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.2-rc1-docs/
>
> The list of bug fixes going into 3.1.2 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349602
>
> This release is using the release script of the tag v3.1.2-rc1.
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.1.2?
> ===
>
> The current list of open tickets targeted at 3.1.2 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.1.2
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>


Re: [VOTE] Release Spark 3.1.2 (RC1)

2021-05-25 Thread Gengliang Wang
Hi Dongjoon,

After Spark 3.1.1, we need an extra step for updating the DocSearch version
index in the release process. I didn't expect Spark 3.1.2 to come at this
time so I haven't updated the release process
 until yesterday.
I think we should use the latest branch-3.1 to regenerate the Spark
documentation. See https://github.com/apache/spark/pull/32654 for details.
I have also enhanced the release process script
 for this.

Thanks
Gengliang




On Tue, May 25, 2021 at 11:31 PM Sean Owen  wrote:

> +1 same result as in previous tests
>
> On Mon, May 24, 2021 at 1:14 AM Dongjoon Hyun 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.1.2.
>>
>> The vote is open until May 27th 1AM (PST) and passes if a majority +1 PMC
>> votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.1.2
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see https://spark.apache.org/
>>
>> The tag to be voted on is v3.1.2-rc1 (commit
>> de351e30a90dd988b133b3d00fa6218bfcaba8b8):
>> https://github.com/apache/spark/tree/v3.1.2-rc1
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.1.2-rc1-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1384/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.1.2-rc1-docs/
>>
>> The list of bug fixes going into 3.1.2 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12349602
>>
>> This release is using the release script of the tag v3.1.2-rc1.
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.1.2?
>> ===
>>
>> The current list of open tickets targeted at 3.1.2 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.1.2
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>>
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>


Re: [VOTE] Release Spark 3.1.2 (RC1)

2021-05-25 Thread Dongjoon Hyun
Thank you, Sean and Gengliang.

To Gengliang, it looks not that serious to me because that's a doc-only
issue which also can be mitigated simply by updating `facetFilters` from
htmls after release.

Bests,
Dongjoon.


On Tue, May 25, 2021 at 9:45 AM Gengliang Wang  wrote:

> Hi Dongjoon,
>
> After Spark 3.1.1, we need an extra step for updating the DocSearch
> version index in the release process. I didn't expect Spark 3.1.2 to come
> at this time so I haven't updated the release process
>  until yesterday.
> I think we should use the latest branch-3.1 to regenerate the Spark
> documentation. See https://github.com/apache/spark/pull/32654 for
> details. I have also enhanced the release process script
>  for this.
>
> Thanks
> Gengliang
>
>
>
>
> On Tue, May 25, 2021 at 11:31 PM Sean Owen  wrote:
>
>> +1 same result as in previous tests
>>
>> On Mon, May 24, 2021 at 1:14 AM Dongjoon Hyun 
>> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 3.1.2.
>>>
>>> The vote is open until May 27th 1AM (PST) and passes if a majority +1
>>> PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.1.2
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see https://spark.apache.org/
>>>
>>> The tag to be voted on is v3.1.2-rc1 (commit
>>> de351e30a90dd988b133b3d00fa6218bfcaba8b8):
>>> https://github.com/apache/spark/tree/v3.1.2-rc1
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.2-rc1-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1384/
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.2-rc1-docs/
>>>
>>> The list of bug fixes going into 3.1.2 can be found at the following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12349602
>>>
>>> This release is using the release script of the tag v3.1.2-rc1.
>>>
>>> FAQ
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 3.1.2?
>>> ===
>>>
>>> The current list of open tickets targeted at 3.1.2 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.1.2
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>>
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>


Re: [VOTE] Release Spark 3.1.2 (RC1)

2021-05-25 Thread Gengliang Wang
SGTM. Thanks for the work!

+1 (non-binding)

On Wed, May 26, 2021 at 1:28 AM Dongjoon Hyun 
wrote:

> Thank you, Sean and Gengliang.
>
> To Gengliang, it looks not that serious to me because that's a doc-only
> issue which also can be mitigated simply by updating `facetFilters` from
> htmls after release.
>
> Bests,
> Dongjoon.
>
>
> On Tue, May 25, 2021 at 9:45 AM Gengliang Wang  wrote:
>
>> Hi Dongjoon,
>>
>> After Spark 3.1.1, we need an extra step for updating the DocSearch
>> version index in the release process. I didn't expect Spark 3.1.2 to come
>> at this time so I haven't updated the release process
>>  until yesterday.
>> I think we should use the latest branch-3.1 to regenerate the Spark
>> documentation. See https://github.com/apache/spark/pull/32654 for
>> details. I have also enhanced the release process script
>>  for this.
>>
>> Thanks
>> Gengliang
>>
>>
>>
>>
>> On Tue, May 25, 2021 at 11:31 PM Sean Owen  wrote:
>>
>>> +1 same result as in previous tests
>>>
>>> On Mon, May 24, 2021 at 1:14 AM Dongjoon Hyun 
>>> wrote:
>>>
 Please vote on releasing the following candidate as Apache Spark
 version 3.1.2.

 The vote is open until May 27th 1AM (PST) and passes if a majority +1
 PMC votes are cast, with a minimum of 3 +1 votes.

 [ ] +1 Release this package as Apache Spark 3.1.2
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see https://spark.apache.org/

 The tag to be voted on is v3.1.2-rc1 (commit
 de351e30a90dd988b133b3d00fa6218bfcaba8b8):
 https://github.com/apache/spark/tree/v3.1.2-rc1

 The release files, including signatures, digests, etc. can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.1.2-rc1-bin/

 Signatures used for Spark RCs can be found in this file:
 https://dist.apache.org/repos/dist/dev/spark/KEYS

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1384/

 The documentation corresponding to this release can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.1.2-rc1-docs/

 The list of bug fixes going into 3.1.2 can be found at the following
 URL:
 https://issues.apache.org/jira/projects/SPARK/versions/12349602

 This release is using the release script of the tag v3.1.2-rc1.

 FAQ

 =
 How can I help test this release?
 =

 If you are a Spark user, you can help us test this release by taking
 an existing Spark workload and running on this release candidate, then
 reporting any regressions.

 If you're working in PySpark you can set up a virtual env and install
 the current RC and see if anything important breaks, in the Java/Scala
 you can add the staging repository to your projects resolvers and test
 with the RC (make sure to clean up the artifact cache before/after so
 you don't end up building with a out of date RC going forward).

 ===
 What should happen to JIRA tickets still targeting 3.1.2?
 ===

 The current list of open tickets targeted at 3.1.2 can be found at:
 https://issues.apache.org/jira/projects/SPARK and search for "Target
 Version/s" = 3.1.2

 Committers should look at those and triage. Extremely important bug
 fixes, documentation, and API tweaks that impact compatibility should
 be worked on immediately. Everything else please retarget to an
 appropriate release.

 ==
 But my bug isn't fixed?
 ==

 In order to make timely releases, we will typically not hold the
 release unless the bug in question is a regression from the previous
 release. That being said, if there is something which is a regression
 that has not been correctly targeted please ping me or a committer to
 help target the issue.

>>>


Re: [VOTE] Release Spark 3.1.2 (RC1)

2021-05-25 Thread Takeshi Yamamuro
+1 (non-binding)

I ran the tests, checked the related jira tickets, and compared TPCDS
performance differences between
this v3.1.2 candidate and v3.1.1.
Everything looks fine.

Thank you, Dongjoon!


On Wed, May 26, 2021 at 2:32 AM Gengliang Wang  wrote:

> SGTM. Thanks for the work!
>
> +1 (non-binding)
>
> On Wed, May 26, 2021 at 1:28 AM Dongjoon Hyun 
> wrote:
>
>> Thank you, Sean and Gengliang.
>>
>> To Gengliang, it looks not that serious to me because that's a doc-only
>> issue which also can be mitigated simply by updating `facetFilters` from
>> htmls after release.
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Tue, May 25, 2021 at 9:45 AM Gengliang Wang  wrote:
>>
>>> Hi Dongjoon,
>>>
>>> After Spark 3.1.1, we need an extra step for updating the DocSearch
>>> version index in the release process. I didn't expect Spark 3.1.2 to come
>>> at this time so I haven't updated the release process
>>>  until yesterday.
>>> I think we should use the latest branch-3.1 to regenerate the Spark
>>> documentation. See https://github.com/apache/spark/pull/32654 for
>>> details. I have also enhanced the release process script
>>>  for this.
>>>
>>> Thanks
>>> Gengliang
>>>
>>>
>>>
>>>
>>> On Tue, May 25, 2021 at 11:31 PM Sean Owen  wrote:
>>>
 +1 same result as in previous tests

 On Mon, May 24, 2021 at 1:14 AM Dongjoon Hyun 
 wrote:

> Please vote on releasing the following candidate as Apache Spark
> version 3.1.2.
>
> The vote is open until May 27th 1AM (PST) and passes if a majority +1
> PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.1.2
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see https://spark.apache.org/
>
> The tag to be voted on is v3.1.2-rc1 (commit
> de351e30a90dd988b133b3d00fa6218bfcaba8b8):
> https://github.com/apache/spark/tree/v3.1.2-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.2-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1384/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.2-rc1-docs/
>
> The list of bug fixes going into 3.1.2 can be found at the following
> URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349602
>
> This release is using the release script of the tag v3.1.2-rc1.
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.1.2?
> ===
>
> The current list of open tickets targeted at 3.1.2 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.1.2
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>


-- 
---
Takeshi Yamamuro


Re: [VOTE] Release Spark 3.1.2 (RC1)

2021-05-25 Thread Cheng Su
+1 (non-binding)

Checked the related commits in commit history manually.

Thanks!
Cheng Su

From: Takeshi Yamamuro 
Date: Tuesday, May 25, 2021 at 4:47 PM
To: Dongjoon Hyun , dev 
Subject: Re: [VOTE] Release Spark 3.1.2 (RC1)

+1 (non-binding)

I ran the tests, checked the related jira tickets, and compared TPCDS 
performance differences between
this v3.1.2 candidate and v3.1.1.
Everything looks fine.

Thank you, Dongjoon!


On Wed, May 26, 2021 at 2:32 AM Gengliang Wang 
mailto:ltn...@gmail.com>> wrote:
SGTM. Thanks for the work!

+1 (non-binding)

On Wed, May 26, 2021 at 1:28 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Thank you, Sean and Gengliang.

To Gengliang, it looks not that serious to me because that's a doc-only issue 
which also can be mitigated simply by updating `facetFilters` from htmls after 
release.

Bests,
Dongjoon.


On Tue, May 25, 2021 at 9:45 AM Gengliang Wang 
mailto:ltn...@gmail.com>> wrote:
Hi Dongjoon,

After Spark 3.1.1, we need an extra step for updating the DocSearch version 
index in the release process. I didn't expect Spark 3.1.2 to come at this time 
so I haven't updated the release 
process until yesterday.
I think we should use the latest branch-3.1 to regenerate the Spark 
documentation. See https://github.com/apache/spark/pull/32654 for details. I 
have also enhanced the release process 
script for this.

Thanks
Gengliang




On Tue, May 25, 2021 at 11:31 PM Sean Owen 
mailto:sro...@apache.org>> wrote:
+1 same result as in previous tests

On Mon, May 24, 2021 at 1:14 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Please vote on releasing the following candidate as Apache Spark version 3.1.2.

The vote is open until May 27th 1AM (PST) and passes if a majority +1 PMC votes 
are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.1.2
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see 
https://spark.apache.org/

The tag to be voted on is v3.1.2-rc1 (commit 
de351e30a90dd988b133b3d00fa6218bfcaba8b8):
https://github.com/apache/spark/tree/v3.1.2-rc1

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.1.2-rc1-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1384/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.1.2-rc1-docs/

The list of bug fixes going into 3.1.2 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12349602

This release is using the release script of the tag v3.1.2-rc1.

FAQ

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.1.2?
===

The current list of open tickets targeted at 3.1.2 can be found at:
https://issues.apache.org/jira/projects/SPARK
 and search for "Target Version/s" = 3.1.2

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


--
---
Takeshi Yamamuro


Re: [VOTE] Release Spark 3.1.2 (RC1)

2021-05-25 Thread Hyukjin Kwon
+1

2021년 5월 26일 (수) 오전 9:00, Cheng Su 님이 작성:

> +1 (non-binding)
>
>
>
> Checked the related commits in commit history manually.
>
>
>
> Thanks!
>
> Cheng Su
>
>
>
> *From: *Takeshi Yamamuro 
> *Date: *Tuesday, May 25, 2021 at 4:47 PM
> *To: *Dongjoon Hyun , dev 
> *Subject: *Re: [VOTE] Release Spark 3.1.2 (RC1)
>
>
>
> +1 (non-binding)
>
>
>
> I ran the tests, checked the related jira tickets, and compared TPCDS
> performance differences between
>
> this v3.1.2 candidate and v3.1.1.
>
> Everything looks fine.
>
>
>
> Thank you, Dongjoon!
>
>
>
>
>
> On Wed, May 26, 2021 at 2:32 AM Gengliang Wang  wrote:
>
> SGTM. Thanks for the work!
>
>
>
> +1 (non-binding)
>
>
>
> On Wed, May 26, 2021 at 1:28 AM Dongjoon Hyun 
> wrote:
>
> Thank you, Sean and Gengliang.
>
>
>
> To Gengliang, it looks not that serious to me because that's a doc-only
> issue which also can be mitigated simply by updating `facetFilters` from
> htmls after release.
>
>
>
> Bests,
>
> Dongjoon.
>
>
>
>
>
> On Tue, May 25, 2021 at 9:45 AM Gengliang Wang  wrote:
>
> Hi Dongjoon,
>
>
>
> After Spark 3.1.1, we need an extra step for updating the DocSearch
> version index in the release process. I didn't expect Spark 3.1.2 to come
> at this time so I haven't updated the release process
>  until yesterday.
>
> I think we should use the latest branch-3.1 to regenerate the Spark
> documentation. See https://github.com/apache/spark/pull/32654 for
> details. I have also enhanced the release process script
>  for this.
>
>
>
> Thanks
>
> Gengliang
>
>
>
>
>
>
>
>
>
> On Tue, May 25, 2021 at 11:31 PM Sean Owen  wrote:
>
> +1 same result as in previous tests
>
>
>
> On Mon, May 24, 2021 at 1:14 AM Dongjoon Hyun 
> wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
> 3.1.2.
>
> The vote is open until May 27th 1AM (PST) and passes if a majority +1 PMC
> votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.1.2
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see https://spark.apache.org/
>
> The tag to be voted on is v3.1.2-rc1 (commit
> de351e30a90dd988b133b3d00fa6218bfcaba8b8):
> https://github.com/apache/spark/tree/v3.1.2-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.2-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1384/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.1.2-rc1-docs/
>
> The list of bug fixes going into 3.1.2 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12349602
>
> This release is using the release script of the tag v3.1.2-rc1.
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.1.2?
> ===
>
> The current list of open tickets targeted at 3.1.2 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.1.2
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
>
>
> --
>
> ---
> Takeshi Yamamuro
>


Re: [VOTE] Release Spark 3.1.2 (RC1)

2021-05-25 Thread Liang-Chi Hsieh
+1 (non-binding)

Binary and doc looks good. JIRA tickets looks good. Ran simple tasks.

Thank you, Dongjoon!


Hyukjin Kwon wrote
> +1
> 
> 2021년 5월 26일 (수) 오전 9:00, Cheng Su <

> chengsu@.com

> >님이 작성:
> 
>> +1 (non-binding)
>>
>>
>>
>> Checked the related commits in commit history manually.
>>
>>
>>
>> Thanks!
>>
>> Cheng Su
>>
>>
>>
>> *From: *Takeshi Yamamuro <

> linguin.m.s@

> >
>> *Date: *Tuesday, May 25, 2021 at 4:47 PM
>> *To: *Dongjoon Hyun <

> dongjoon.hyun@

> >, dev <

> dev@.apache

> >
>> *Subject: *Re: [VOTE] Release Spark 3.1.2 (RC1)
>>
>>
>>
>> +1 (non-binding)
>>
>>
>>
>> I ran the tests, checked the related jira tickets, and compared TPCDS
>> performance differences between
>>
>> this v3.1.2 candidate and v3.1.1.
>>
>> Everything looks fine.
>>
>>
>>
>> Thank you, Dongjoon!





--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 3.1.2 (RC1)

2021-05-25 Thread John Zhuge
+1 (non-binding)

Validated checksum and signature; ran RAT checks; tried
spark-3.1.2-bin-hadoop2.7 with HMS 1.2.

On Tue, May 25, 2021 at 7:24 PM Liang-Chi Hsieh  wrote:

> +1 (non-binding)
>
> Binary and doc looks good. JIRA tickets looks good. Ran simple tasks.
>
> Thank you, Dongjoon!
>
>
> Hyukjin Kwon wrote
> > +1
> >
> > 2021년 5월 26일 (수) 오전 9:00, Cheng Su <
>
> > chengsu@.com
>
> > >님이 작성:
> >
> >> +1 (non-binding)
> >>
> >>
> >>
> >> Checked the related commits in commit history manually.
> >>
> >>
> >>
> >> Thanks!
> >>
> >> Cheng Su
> >>
> >>
> >>
> >> *From: *Takeshi Yamamuro <
>
> > linguin.m.s@
>
> > >
> >> *Date: *Tuesday, May 25, 2021 at 4:47 PM
> >> *To: *Dongjoon Hyun <
>
> > dongjoon.hyun@
>
> > >, dev <
>
> > dev@.apache
>
> > >
> >> *Subject: *Re: [VOTE] Release Spark 3.1.2 (RC1)
> >>
> >>
> >>
> >> +1 (non-binding)
> >>
> >>
> >>
> >> I ran the tests, checked the related jira tickets, and compared TPCDS
> >> performance differences between
> >>
> >> this v3.1.2 candidate and v3.1.1.
> >>
> >> Everything looks fine.
> >>
> >>
> >>
> >> Thank you, Dongjoon!
>
>
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

-- 
John Zhuge