Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-09 Thread Cheng Pan
+1 (non-binding)

I verified:

1. LICENSE/NOTICE are present
2. Signatures is correct
3. Build source code and run UT (I have to replace sparksrc folder with the 
content of spark-4.0.0.tgz to make the source happen)

Thanks,
Cheng Pan



> On Jun 10, 2025, at 00:59, Martin Grund  wrote:
> 
> Hi folks,
> 
> Please vote on releasing the following candidate as Apache Spark Connect Go 
> Client 0.1.0. 
> 
> The release candidate was tested and built against Spark 4.0.0. The 
> repository contains a sample application for submitting jobs written in Go 
> using a small JVM wrapper 
>   and 
> quickstart 
>  
> information.
> 
> This vote is open for the next 72 hours and passes if a majority +1 PMC votes 
> are cast, with a minimum of 3 +1 votes.
> 
> [ ] +1 Release this package as Apache Spark Connect Go Client 0.1.0
> [] -1 Do not release this package because ...
> 
> Tag: https://github.com/apache/spark-connect-go/tree/v0.1.0-rc2 
>  (commit 
> defb8525088150f9f328136a35fa7c5f64fe2733)
> 
> The artifacts are available as well here: 
> https://dist.apache.org/repos/dist/dev/spark/spark-connect-go-0.1.0-rc2/
> 
> The artifacts can be verified using the KEYS file 
> https://dist.apache.org/repos/dist/dev/spark/KEYS
> 
> I've addressed the comments above with regard to:
> 
> - Build out of source tree
> - Signing using the dev KEYS file
> - Missing NOTICE file
> - Upload to the GitHub distribution
> 
> Thanks
> Martin
> 
> On Mon, Jun 9, 2025 at 8:54 AM Martin Grund  > wrote:
>> Thanks for the feedback, I'll address it shortly. 
>> 
>> On Mon, Jun 9, 2025 at 08:31 Cheng Pan > > wrote:
>>> Hi Martin,
>>> 
>>> Thanks for addressing it, a few questions/issues I found:
>>> 
>>> 1. The "fun Version"[1] returns "3.5.x”, this does not look like a correct 
>>> version as you claim this release candidates was built and tested against 
>>> Spark 4.0.0.
>>> 
>>> 2. Seems your public key was not added to KEYS, so I can not verify your 
>>> signature.
>>> 
>>> $ wget https://downloads.apache.org/spark/KEYS
>>> $ gpg --import KEYS
>>> $ gpg --verify spark-connect-go-0.1.0-rc1.zip.asc
>>> gpg: assuming signed data in 'spark-connect-go-0.1.0-rc1.zip'
>>> gpg: Signature made Mon Jun  9 20:30:11 2025 CST
>>> gpg:using RSA key 4E3B5C29DD2CCCF97925469C1E0086A46C650707
>>> gpg: Can't check signature: No public key
>>> 
>>> 3. Though it’s not enforced, but so far all Spark release candidates were 
>>> put at [2], instead of using GitHub release, I would recommend connect-go 
>>> to follow that too.
>>> 
>>> > Projects should use the /dev tree of the dist repository or the staging 
>>> > features of repository.apache.org  to host 
>>> > release candidates posted for developer testing/voting (prior to being, 
>>> > potentially, formally blessed as a GA release).
>>> 
>>> 4. The source releases are non-compilable because it does not contain the 
>>> spark source code. To be clear, it[3] requires the "source release 
>>> artifacts” MUST be sufficient for a user to build and test, not the git 
>>> repo.
>>> 
>>> Failure: directory "sparksrc/sql/connect/common/src/main/protobuf" listed 
>>> in buf.work.yaml contains no .proto files
>>> exit status 1
>>> make: *** [Makefile:69: internal/generated.out] Error 1
>>> root@c072c654a72e:/go/spark-connect-go-0.1.0-rc1# ls 
>>> sparksrc/sql/connect/common/src/main/protobuf
>>> ls: cannot access 'sparksrc/sql/connect/common/src/main/protobuf': No such 
>>> file or directory
>>> 
>>> 5. Missing NOTICE file [4]
>>> 
>>> > Each package MUST provide a LICENSE file and a NOTICE file ...
>>> 
>>> [1] 
>>> https://github.com/apache/spark-connect-go/blob/v0.1.0-rc1/spark/version.go#L19
>>> [2] https://dist.apache.org/repos/dist/dev/spark
>>> [3] https://www.apache.org/legal/release-policy.html#source-packages
>>> [4] https://www.apache.org/legal/release-policy.html#licensing-documentation
>>> 
>>> Thanks,
>>> Cheng Pan
>>> 
>>> 
>>> 
 On Jun 9, 2025, at 20:32, Martin Grund >>> > wrote:
 
 I updated the release based on the tag with the source releases and the 
 proper signature.
 
 https://github.com/apache/spark-connect-go/releases/tag/v0.1.0-rc1
 
 On Sun, Jun 8, 2025 at 10:44 PM Cheng Pan >>> > wrote:
> The release artifacts don’t satisfy the ASF release policy[1].
> 
> > Projects MUST direct outsiders towards official releases rather than 
> > raw source repositories, nightly builds, snapshots, release candidates, 
> > or any other similar packages.
> 
> > Every ASF release MUST contain one or more source packages, which MUST 
> > be sufficient for a user to build and test the r

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-09 Thread bo yang
+1 (non-binding), thanks Martin!

On Mon, Jun 9, 2025 at 7:47 PM Cheng Pan  wrote:

> +1 (non-binding)
>
> I verified:
>
> 1. LICENSE/NOTICE are present
> 2. Signatures is correct
> 3. Build source code and run UT (I have to replace sparksrc folder with
> the content of spark-4.0.0.tgz to make the source happen)
>
> Thanks,
> Cheng Pan
>
>
>
> On Jun 10, 2025, at 00:59, Martin Grund  wrote:
>
> Hi folks,
>
> Please vote on releasing the following candidate as Apache Spark Connect
> Go Client 0.1.0.
>
> The release candidate was tested and built against Spark 4.0.0. The
> repository contains a sample application for submitting jobs written in Go
> using a small JVM wrapper
> 
> and quickstart
> 
>  information.
>
> This vote is open for the next 72 hours and passes if a majority +1 PMC
> votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark Connect Go Client 0.1.0
> [] -1 Do not release this package because ...
>
> Tag: https://github.com/apache/spark-connect-go/tree/v0.1.0-rc2
>  (commit
> defb8525088150f9f328136a35fa7c5f64fe2733)
>
> The artifacts are available as well here:
> https://dist.apache.org/repos/dist/dev/spark/spark-connect-go-0.1.0-rc2/
>
> The artifacts can be verified using the KEYS file
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> I've addressed the comments above with regard to:
>
> - Build out of source tree
> - Signing using the dev KEYS file
> - Missing NOTICE file
> - Upload to the GitHub distribution
>
> Thanks
> Martin
>
> On Mon, Jun 9, 2025 at 8:54 AM Martin Grund  wrote:
>
>> Thanks for the feedback, I'll address it shortly.
>>
>> On Mon, Jun 9, 2025 at 08:31 Cheng Pan  wrote:
>>
>>> Hi Martin,
>>>
>>> Thanks for addressing it, a few questions/issues I found:
>>>
>>> 1. The "fun Version"[1] returns "3.5.x”, this does not look like a
>>> correct version as you claim this release candidates was built and tested
>>> against Spark 4.0.0.
>>>
>>> 2. Seems your public key was not added to KEYS, so I can not verify your
>>> signature.
>>>
>>> $ wget https://downloads.apache.org/spark/KEYS
>>> $ gpg --import KEYS
>>> $ gpg --verify spark-connect-go-0.1.0-rc1.zip.asc
>>> gpg: assuming signed data in 'spark-connect-go-0.1.0-rc1.zip'
>>> gpg: Signature made Mon Jun  9 20:30:11 2025 CST
>>> gpg:using RSA key
>>> 4E3B5C29DD2CCCF97925469C1E0086A46C650707
>>> gpg: Can't check signature: No public key
>>>
>>> 3. Though it’s not enforced, but so far all Spark release candidates
>>> were put at [2], instead of using GitHub release, I would recommend
>>> connect-go to follow that too.
>>>
>>> > Projects should use the /dev tree of the dist repository or the
>>> staging features of repository.apache.org to host release candidates
>>> posted for developer testing/voting (prior to being, potentially, formally
>>> blessed as a GA release).
>>>
>>> 4. The source releases are non-compilable because it does not contain
>>> the spark source code. To be clear, it[3] requires the "source release
>>> artifacts” MUST be sufficient for a user to build and test, not the git
>>> repo.
>>>
>>> Failure: directory "sparksrc/sql/connect/common/src/main/protobuf"
>>> listed in buf.work.yaml contains no .proto files
>>> exit status 1
>>> make: *** [Makefile:69: internal/generated.out] Error 1
>>> root@c072c654a72e:/go/spark-connect-go-0.1.0-rc1# ls
>>> sparksrc/sql/connect/common/src/main/protobuf
>>> ls: cannot access 'sparksrc/sql/connect/common/src/main/protobuf': No
>>> such file or directory
>>>
>>> 5. Missing NOTICE file [4]
>>>
>>> > Each package MUST provide a LICENSE file and a NOTICE file ...
>>>
>>> [1]
>>> https://github.com/apache/spark-connect-go/blob/v0.1.0-rc1/spark/version.go#L19
>>> [2] https://dist.apache.org/repos/dist/dev/spark
>>> [3] https://www.apache.org/legal/release-policy.html#source-packages
>>> [4]
>>> https://www.apache.org/legal/release-policy.html#licensing-documentation
>>>
>>> Thanks,
>>> Cheng Pan
>>>
>>>
>>>
>>> On Jun 9, 2025, at 20:32, Martin Grund  wrote:
>>>
>>> I updated the release based on the tag with the source releases and the
>>> proper signature.
>>>
>>> https://github.com/apache/spark-connect-go/releases/tag/v0.1.0-rc1
>>>
>>> On Sun, Jun 8, 2025 at 10:44 PM Cheng Pan  wrote:
>>>
 The release artifacts don’t satisfy the ASF release policy[1].

 > Projects MUST direct outsiders towards official releases rather than
 raw source repositories, nightly builds, snapshots, release candidates, or
 any other similar packages.

 > Every ASF release MUST contain one or more source packages, which
 MUST be sufficient for a user to build and test the release provided they
 have access to the appropriate platform and tools. A source release SHOULD
 not contain compiled code

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-09 Thread Cheng Pan
Hi Martin,

Thanks for addressing it, a few questions/issues I found:

1. The "fun Version"[1] returns "3.5.x”, this does not look like a correct 
version as you claim this release candidates was built and tested against Spark 
4.0.0.

2. Seems your public key was not added to KEYS, so I can not verify your 
signature.

$ wget https://downloads.apache.org/spark/KEYS
$ gpg --import KEYS
$ gpg --verify spark-connect-go-0.1.0-rc1.zip.asc
gpg: assuming signed data in 'spark-connect-go-0.1.0-rc1.zip'
gpg: Signature made Mon Jun  9 20:30:11 2025 CST
gpg:using RSA key 4E3B5C29DD2CCCF97925469C1E0086A46C650707
gpg: Can't check signature: No public key

3. Though it’s not enforced, but so far all Spark release candidates were put 
at [2], instead of using GitHub release, I would recommend connect-go to follow 
that too.

> Projects should use the /dev tree of the dist repository or the staging 
> features of repository.apache.org to host release candidates posted for 
> developer testing/voting (prior to being, potentially, formally blessed as a 
> GA release).

4. The source releases are non-compilable because it does not contain the spark 
source code. To be clear, it[3] requires the "source release artifacts” MUST be 
sufficient for a user to build and test, not the git repo.

Failure: directory "sparksrc/sql/connect/common/src/main/protobuf" listed in 
buf.work.yaml contains no .proto files
exit status 1
make: *** [Makefile:69: internal/generated.out] Error 1
root@c072c654a72e:/go/spark-connect-go-0.1.0-rc1# ls 
sparksrc/sql/connect/common/src/main/protobuf
ls: cannot access 'sparksrc/sql/connect/common/src/main/protobuf': No such file 
or directory

5. Missing NOTICE file [4]

> Each package MUST provide a LICENSE file and a NOTICE file ...

[1] 
https://github.com/apache/spark-connect-go/blob/v0.1.0-rc1/spark/version.go#L19
[2] https://dist.apache.org/repos/dist/dev/spark
[3] https://www.apache.org/legal/release-policy.html#source-packages
[4] https://www.apache.org/legal/release-policy.html#licensing-documentation

Thanks,
Cheng Pan



> On Jun 9, 2025, at 20:32, Martin Grund  wrote:
> 
> I updated the release based on the tag with the source releases and the 
> proper signature.
> 
> https://github.com/apache/spark-connect-go/releases/tag/v0.1.0-rc1
> 
> On Sun, Jun 8, 2025 at 10:44 PM Cheng Pan  > wrote:
>> The release artifacts don’t satisfy the ASF release policy[1].
>> 
>> > Projects MUST direct outsiders towards official releases rather than raw 
>> > source repositories, nightly builds, snapshots, release candidates, or any 
>> > other similar packages.
>> 
>> > Every ASF release MUST contain one or more source packages, which MUST be 
>> > sufficient for a user to build and test the release provided they have 
>> > access to the appropriate platform and tools. A source release SHOULD not 
>> > contain compiled code.
>> 
>> [1] https://www.apache.org/legal/release-policy.html#publication
>> 
>> Thanks,
>> Cheng Pan
>> 
>> 
>> 
>>> On Jun 9, 2025, at 12:21, Martin Grund  
>>> wrote:
>>> 
>>> Please vote on releasing the following candidate as Apache Spark Connect Go 
>>> Client 0.1.0. 
>>> 
>>> The release candidate was tested and built against Spark 4.0.0. The 
>>> repository contains a sample application for submitting jobs written in Go 
>>> using a small JVM wrapper 
>>>   
>>> and quickstart 
>>>  
>>> information.
>>> 
>>> This vote is open for the next 72 hours and passes if a majority +1 PMC 
>>> votes are cast, with a minimum of 3 +1 votes.
>>> 
>>> [ ] +1 Release this package as Apache Spark Connect Go Client 0.1.0
>>> [] -1 Do not release this package because ...
>>> 
>>> Tag: https://github.com/apache/spark-connect-go/tree/v0.1.0-rc1 (commit 
>>> 2383413460105fbc665c7c36d7943d5f05a5b245)
>>> 
>>> Thanks
>>> Martin
>> 



Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-09 Thread Martin Grund
Thanks for the feedback, I'll address it shortly.

On Mon, Jun 9, 2025 at 08:31 Cheng Pan  wrote:

> Hi Martin,
>
> Thanks for addressing it, a few questions/issues I found:
>
> 1. The "fun Version"[1] returns "3.5.x”, this does not look like a correct
> version as you claim this release candidates was built and tested against
> Spark 4.0.0.
>
> 2. Seems your public key was not added to KEYS, so I can not verify your
> signature.
>
> $ wget https://downloads.apache.org/spark/KEYS
> $ gpg --import KEYS
> $ gpg --verify spark-connect-go-0.1.0-rc1.zip.asc
> gpg: assuming signed data in 'spark-connect-go-0.1.0-rc1.zip'
> gpg: Signature made Mon Jun  9 20:30:11 2025 CST
> gpg:using RSA key 4E3B5C29DD2CCCF97925469C1E0086A46C650707
> gpg: Can't check signature: No public key
>
> 3. Though it’s not enforced, but so far all Spark release candidates were
> put at [2], instead of using GitHub release, I would recommend connect-go
> to follow that too.
>
> > Projects should use the /dev tree of the dist repository or the staging
> features of repository.apache.org to host release candidates posted for
> developer testing/voting (prior to being, potentially, formally blessed as
> a GA release).
>
> 4. The source releases are non-compilable because it does not contain the
> spark source code. To be clear, it[3] requires the "source release
> artifacts” MUST be sufficient for a user to build and test, not the git
> repo.
>
> Failure: directory "sparksrc/sql/connect/common/src/main/protobuf" listed
> in buf.work.yaml contains no .proto files
> exit status 1
> make: *** [Makefile:69: internal/generated.out] Error 1
> root@c072c654a72e:/go/spark-connect-go-0.1.0-rc1# ls
> sparksrc/sql/connect/common/src/main/protobuf
> ls: cannot access 'sparksrc/sql/connect/common/src/main/protobuf': No such
> file or directory
>
> 5. Missing NOTICE file [4]
>
> > Each package MUST provide a LICENSE file and a NOTICE file ...
>
> [1]
> https://github.com/apache/spark-connect-go/blob/v0.1.0-rc1/spark/version.go#L19
> [2] https://dist.apache.org/repos/dist/dev/spark
> [3] https://www.apache.org/legal/release-policy.html#source-packages
> [4]
> https://www.apache.org/legal/release-policy.html#licensing-documentation
>
> Thanks,
> Cheng Pan
>
>
>
> On Jun 9, 2025, at 20:32, Martin Grund  wrote:
>
> I updated the release based on the tag with the source releases and the
> proper signature.
>
> https://github.com/apache/spark-connect-go/releases/tag/v0.1.0-rc1
>
> On Sun, Jun 8, 2025 at 10:44 PM Cheng Pan  wrote:
>
>> The release artifacts don’t satisfy the ASF release policy[1].
>>
>> > Projects MUST direct outsiders towards official releases rather than
>> raw source repositories, nightly builds, snapshots, release candidates, or
>> any other similar packages.
>>
>> > Every ASF release MUST contain one or more source packages, which MUST
>> be sufficient for a user to build and test the release provided they have
>> access to the appropriate platform and tools. A source release SHOULD not
>> contain compiled code.
>>
>> [1] https://www.apache.org/legal/release-policy.html#publication
>>
>> Thanks,
>> Cheng Pan
>>
>>
>>
>> On Jun 9, 2025, at 12:21, Martin Grund 
>> wrote:
>>
>> Please vote on releasing the following candidate as Apache Spark Connect
>> Go Client 0.1.0.
>>
>> The release candidate was tested and built against Spark 4.0.0. The
>> repository contains a sample application for submitting jobs written in Go
>> using a small JVM wrapper
>> 
>> and quickstart
>> 
>> information.
>>
>> This vote is open for the next 72 hours and passes if a majority +1 PMC
>> votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark Connect Go Client 0.1.0
>> [] -1 Do not release this package because ...
>>
>> Tag: https://github.com/apache/spark-connect-go/tree/v0.1.0-rc1 (commit
>> 2383413460105fbc665c7c36d7943d5f05a5b245)
>>
>> Thanks
>> Martin
>>
>>
>>
>


Re: [VOTE] Release Spark 3.5.7 (RC1)

2025-06-09 Thread Hyukjin Kwon
These RC artifacts were dropped properly.

On Mon, 9 Jun 2025 at 07:09, Hyukjin Kwon  wrote:

> This is an automated vote. Please ignore it.
>
> On Mon, Jun 9, 2025 at 6:46 AM  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.5.7.
>>
>> The vote is open until Fri, 13 Jun 2025 06:32:20 PDT and passes if a
>> majority +1 PMC votes are cast, with
>> a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.5.7
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see https://spark.apache.org/
>>
>> The tag to be voted on is v3.5.7-rc1 (commit d5a625d9550):
>> https://github.com/apache/spark/tree/v3.5.7-rc1
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.5.7-rc1-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1498/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.5.7-rc1-docs/
>>
>> The list of bug fixes going into 3.5.7 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12355975
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC via "pip install
>> https://dist.apache.org/repos/dist/dev/spark/v3.5.7-rc1-bin/pyspark-3.5.7.tar.gz
>> "
>> and see if anything important breaks.
>> In the Java/Scala, you can add the staging repository to your project's
>> resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with an out of date RC going forward).
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [DISCUSS] Automation of RC email

2025-06-09 Thread Hyukjin Kwon
The PR is ready for a look 👍

On Sun, 8 Jun 2025 at 17:41, Hyukjin Kwon  wrote:

> I am working on it at https://github.com/apache/spark/pull/51122.
> Some emails might be sent for RC 3.5.7 for testing purposes. Please ignore
> them :-). I will reply to individual email as well to avoid confusion.
>
> On Thu, 5 Jun 2025 at 20:07, Yang Jie  wrote:
>
>> Option 1 +1, thank you, Hyukjin, for the efforts you've put into this.
>>
>> On 2025/06/06 02:59:32 Jungtaek Lim wrote:
>> > Thanks for the confirmation. That sounds great as long as the ASF
>> account
>> > information is required per run and never be stored somewhere after the
>> run.
>> >
>> > 2025년 6월 6일 (금) 오전 11:39, Hyukjin Kwon 님이 작성:
>> >
>> > > When you run the GitHub Actions to release, it requires you to
>> specify an
>> > > ASF account and password in GitHub Secrets. So I plan to use that to
>> send
>> > > an email.
>> > > I will probably add a note that the email was auto generated ..
>> > >
>> > > On Fri, 6 Jun 2025 at 11:37, Jungtaek Lim <
>> kabhwan.opensou...@gmail.com>
>> > > wrote:
>> > >
>> > >> One question: is it possible for the automation to send the mail on
>> > >> behalf of release manager? Or will we simply send the mail as
>> specific mail
>> > >> account (mostly dedicated one for automated)?
>> > >>
>> > >> Maybe latter doesn’t even matter, but it might be less clear about
>> who is
>> > >> driving the release, from automated RC mail.
>> > >>
>> > >> 2025년 6월 6일 (금) 오전 2:09, Wenchen Fan 님이 작성:
>> > >>
>> > >>> +1 for email automation!
>> > >>>
>> > >>> On Thu, Jun 5, 2025 at 8:22 AM Yuanjian Li 
>> > >>> wrote:
>> > >>>
>> >  +1 for option 1.
>> > 
>> >  Seems the only downside of option 1 is that some RC numbers may be
>> >  non-sequential.
>> > 
>> >  Dongjoon Hyun  于2025年6月5日周四 07:57写道:
>> > 
>> > > +1 for the proposal, Hyukjin. Thank you for the whole and seamless
>> > > migration toward this direction.
>> > >
>> > > Please make it sure that we explicitly show the human release
>> manager
>> > > name and email address (instead of bot sender) in the generated
>> email.
>> > > That's the only concern I have.
>> > >
>> > > Thanks,
>> > > Dongjoon.
>> > >
>> > >
>> > >
>> > > On Wed, Jun 4, 2025 at 9:32 PM Mridul Muralidharan <
>> mri...@gmail.com>
>> > > wrote:
>> > >
>> > >>
>> > >>   We can always invalidate the vote with -1 in case it is found
>> to be
>> > >> sent incorrectly ... As long as the automation does not end up
>> generating a
>> > >> tonne of mails, that is, it should be fairly manageable :)
>> > >> I am in favor of automating it with option 1.
>> > >>
>> > >> Thanks for driving this Hyukjin !
>> > >>
>> > >> Regards,
>> > >> Mridul
>> > >>
>> > >>
>> > >> On Wed, Jun 4, 2025 at 6:53 PM Hyukjin Kwon <
>> gurwls...@apache.org>
>> > >> wrote:
>> > >>
>> > >>> Hi all,
>> > >>>
>> > >>> As some of you may know, I’ve been working on automating the
>> Spark
>> > >>> release process (release.yml
>> > >>> > >).
>> > >>> The basic steps are done, and I’m now looking into automating
>> some of the
>> > >>> remaining manual tasks.
>> > >>>
>> > >>> One such task is sending the email to start the vote for an RC.
>> I’d
>> > >>> like to automate this step as well.
>> > >>>
>> > >>> The potential downside is that, in corner cases, an incorrect RC
>> > >>> might still trigger the vote email (even though failures should
>> be caught
>> > >>> earlier). To handle this, I propose we send the email
>> automatically and
>> > >>> rely on the community to help verify the RC. If something is
>> wrong, we can
>> > >>> simply cut a new RC - which is now much easier to do.
>> > >>>
>> > >>> Alternatively, a more conservative option is to generate a
>> draft of
>> > >>> the email in the build log and let the release manager copy and
>> send it
>> > >>> manually.
>> > >>>
>> > >>> I personally prefer the first approach, but I’d like to hear
>> what
>> > >>> others think.
>> > >>>
>> > >>>
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-09 Thread Martin Grund
Hi folks,

Please vote on releasing the following candidate as Apache Spark Connect Go
Client 0.1.0.

The release candidate was tested and built against Spark 4.0.0. The
repository contains a sample application for submitting jobs written in Go
using a small JVM wrapper

and quickstart

 information.

This vote is open for the next 72 hours and passes if a majority +1 PMC
votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark Connect Go Client 0.1.0
[] -1 Do not release this package because ...

Tag: https://github.com/apache/spark-connect-go/tree/v0.1.0-rc2
 (commit
defb8525088150f9f328136a35fa7c5f64fe2733)

The artifacts are available as well here:
https://dist.apache.org/repos/dist/dev/spark/spark-connect-go-0.1.0-rc2/

The artifacts can be verified using the KEYS file
https://dist.apache.org/repos/dist/dev/spark/KEYS

I've addressed the comments above with regard to:

- Build out of source tree
- Signing using the dev KEYS file
- Missing NOTICE file
- Upload to the GitHub distribution

Thanks
Martin

On Mon, Jun 9, 2025 at 8:54 AM Martin Grund  wrote:

> Thanks for the feedback, I'll address it shortly.
>
> On Mon, Jun 9, 2025 at 08:31 Cheng Pan  wrote:
>
>> Hi Martin,
>>
>> Thanks for addressing it, a few questions/issues I found:
>>
>> 1. The "fun Version"[1] returns "3.5.x”, this does not look like a
>> correct version as you claim this release candidates was built and tested
>> against Spark 4.0.0.
>>
>> 2. Seems your public key was not added to KEYS, so I can not verify your
>> signature.
>>
>> $ wget https://downloads.apache.org/spark/KEYS
>> $ gpg --import KEYS
>> $ gpg --verify spark-connect-go-0.1.0-rc1.zip.asc
>> gpg: assuming signed data in 'spark-connect-go-0.1.0-rc1.zip'
>> gpg: Signature made Mon Jun  9 20:30:11 2025 CST
>> gpg:using RSA key 4E3B5C29DD2CCCF97925469C1E0086A46C650707
>> gpg: Can't check signature: No public key
>>
>> 3. Though it’s not enforced, but so far all Spark release candidates were
>> put at [2], instead of using GitHub release, I would recommend connect-go
>> to follow that too.
>>
>> > Projects should use the /dev tree of the dist repository or the staging
>> features of repository.apache.org to host release candidates posted for
>> developer testing/voting (prior to being, potentially, formally blessed as
>> a GA release).
>>
>> 4. The source releases are non-compilable because it does not contain the
>> spark source code. To be clear, it[3] requires the "source release
>> artifacts” MUST be sufficient for a user to build and test, not the git
>> repo.
>>
>> Failure: directory "sparksrc/sql/connect/common/src/main/protobuf" listed
>> in buf.work.yaml contains no .proto files
>> exit status 1
>> make: *** [Makefile:69: internal/generated.out] Error 1
>> root@c072c654a72e:/go/spark-connect-go-0.1.0-rc1# ls
>> sparksrc/sql/connect/common/src/main/protobuf
>> ls: cannot access 'sparksrc/sql/connect/common/src/main/protobuf': No
>> such file or directory
>>
>> 5. Missing NOTICE file [4]
>>
>> > Each package MUST provide a LICENSE file and a NOTICE file ...
>>
>> [1]
>> https://github.com/apache/spark-connect-go/blob/v0.1.0-rc1/spark/version.go#L19
>> [2] https://dist.apache.org/repos/dist/dev/spark
>> [3] https://www.apache.org/legal/release-policy.html#source-packages
>> [4]
>> https://www.apache.org/legal/release-policy.html#licensing-documentation
>>
>> Thanks,
>> Cheng Pan
>>
>>
>>
>> On Jun 9, 2025, at 20:32, Martin Grund  wrote:
>>
>> I updated the release based on the tag with the source releases and the
>> proper signature.
>>
>> https://github.com/apache/spark-connect-go/releases/tag/v0.1.0-rc1
>>
>> On Sun, Jun 8, 2025 at 10:44 PM Cheng Pan  wrote:
>>
>>> The release artifacts don’t satisfy the ASF release policy[1].
>>>
>>> > Projects MUST direct outsiders towards official releases rather than
>>> raw source repositories, nightly builds, snapshots, release candidates, or
>>> any other similar packages.
>>>
>>> > Every ASF release MUST contain one or more source packages, which MUST
>>> be sufficient for a user to build and test the release provided they have
>>> access to the appropriate platform and tools. A source release SHOULD not
>>> contain compiled code.
>>>
>>> [1] https://www.apache.org/legal/release-policy.html#publication
>>>
>>> Thanks,
>>> Cheng Pan
>>>
>>>
>>>
>>> On Jun 9, 2025, at 12:21, Martin Grund 
>>> wrote:
>>>
>>> Please vote on releasing the following candidate as Apache Spark Connect
>>> Go Client 0.1.0.
>>>
>>> The release candidate was tested and built against Spark 4.0.0. The
>>> repository contains a sample application for submitting jobs written in Go
>>> using a small JVM wrapper
>>> 
>>> and quicksta

Re: [DISCUSS] SPIP: Upgrade Apache Hive to 4.x

2025-06-09 Thread Mich Talebzadeh
Thanks Angel for offer of your help

I added some comments to this thread

https://github.com/apache/iceberg/issues/2387

The problem was observed with postgres DB issue. So I am not sure the cause
is metastore on transactional DB or not. This error may not be relevant to
other metastore for Hive but worth investigating it.

cheers


Dr Mich Talebzadeh,
Architect | Data Science | Financial Crime | Forensic Analysis | GDPR

   view my Linkedin profile






On Sat, 7 Jun 2025 at 08:08, Ángel Álvarez Pascua <
angel.alvarez.pas...@gmail.com> wrote:

> I'm also interested in this SPIP.
> There was someone else also working on this, if I remember correctly.
>
> @Mich Talebzadeh  , if you need any help with
> that issue, let me know.
>
> El vie, 6 jun 2025, 1:07, Mich Talebzadeh 
> escribió:
>
>> i started working on this by upgrading my hadoop to
>>
>> Hadoop 3.4.1
>>
>> My Hive is
>>
>> Driver: Hive JDBC (version 4.0.1)
>> Transaction isolation: TRANSACTION_REPEATABLE_READ
>> Running init script /home/hduser/dba/bin/add_jars.hql
>> 25/06/05 23:33:44 [main]: WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 0: jdbc:hive2://rhes75:10099/default> *set
>> hive.support.concurrency=false;*
>> No rows affected (0.027 seconds)
>> 0: jdbc:hive2://rhes75:10099/default>
>> 0: jdbc:hive2://rhes75:10099/default> Beeline version 4.0.1 by Apache Hive
>>
>> Now we hive transactional with Hive 4. which we did not have with the
>> prior versions. Compounded my metastore is on Oracle 12.
>> messing around with
>> set hive.support.concurrency=false;
>> show databases
>> . . . . . . . . . . . . . . . . . . > Error: Error running query
>> (state=,code=0)
>>  *set hive.support.concurrency=true*
>> . . . . . . . . . . . . . . . . . . > No rows affected (0.002 seconds)
>> 0: jdbc:hive2://rhes75:10099/default> show databases
>> . . . . . . . . . . . . . . . . . . > ++
>> | database_name  |
>> ++
>> | access |
>> | accounts   |
>>
>> So the last time I worked on it I was trying to sort out the concurrency
>> issues
>>
>> Running simple queries on Hive
>>
>> FAILED: Error in acquiring locks: Error communicating with the metastore
>> ERROR : FAILED: Error in acquiring locks: Error communicating with the
>> metastore
>> org.apache.hadoop.hive.ql.lockmgr.LockException: Error communicating with
>> the metastore
>> at
>> org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:183)
>> at
>> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:475)
>> at
>> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocksWithHeartbeatDelay(DbTxnManager.java:509)
>> at
>> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:296)
>> at
>> org.apache.hadoop.hive.ql.lockmgr.HiveTxnManagerImpl.acquireLocks(HiveTxnManagerImpl.java:81)
>> at
>> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:101)
>> at
>> org.apache.hadoop.hive.ql.DriverTxnHandler.acquireLocksInternal(DriverTxnHandler.java:328)
>> at
>> org.apache.hadoop.hive.ql.DriverTxnHandler.acquireLocks(DriverTxnHandler.java:232)
>> at
>> org.apache.hadoop.hive.ql.DriverTxnHandler.acquireLocksIfNeeded(DriverTxnHandler.java:144)
>> at
>> org.apache.hadoop.hive.ql.Driver.lockAndRespond(Driver.java:356)
>> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:197)
>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154)
>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149)
>> at
>> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185)
>> at
>> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236)
>> at
>> org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90)
>> at
>> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953)
>> at
>> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356)
>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>> Caused by: org.apache.thrift.TApplicationException: Internal error
>> processing lock
>>
>> Unfo

Re: [DISCUSS] Dropping LevelDB support in Spark

2025-06-09 Thread Jungtaek Lim
It's a bit different for users leveraging LevelDB - since it requires
opt-in, they are willing to use it if they still use it, hence they are
likely to retain the config during the upgrade.

>From the initial post, there is a claim that we deprecated LevelDB in
Apache Spark 4.0.0. Shall I ask what we did for deprecation, and would you
mind giving me a discussion about it? I just wanted to make sure we are not
making an outstanding change at one minor version upgrade.

On Mon, Jun 9, 2025 at 3:01 PM Yang Jie  wrote:

> I would like to provide some new information:
>
> 1. Spark 3.4.0 [SPARK-42277] has started using RocksDB as the default
> option for `spark.history.store.hybridStore.diskBackend`.
>
> - Since Spark 3.4, Spark will use RocksDB store if
> `spark.history.store.hybridStore.enabled` is true. To restore the behavior
> before Spark 3.4, you can set `spark.history.store.hybridStore.diskBackend`
> to `LEVELDB`.
>
> 2. Spark 4.0.0 [SPARK-45351] has begun using RocksDB as the default option
> for `spark.shuffle.service.db.backend`.
>
> - Since Spark 4.0, `spark.shuffle.service.db.backend` is set to `ROCKSDB`
> by default which means Spark will use RocksDB store for shuffle service. To
> restore the behavior before Spark 4.0, you can set
> `spark.shuffle.service.db.backend` to `LEVELDB`.
>
> So for users who hadn't explicitly configured the aforementioned options
> to be `LEVELDB` before, the situations of data reconstruction or re-parsing
> have already existed.
>
> On 2025/06/09 01:08:05 Jungtaek Lim wrote:
> > Thanks for the valuable input.
> >
> > I think it's more about the case where upgrading would surprise the end
> > users. If we simply remove LevelDB from the next release, we will be
> > removing these intermediate data as well and enforcing them to rebuild
> > everything. 15 mins is probably not super long from the given volume, but
> > even a couple additional minutes could bring a negative sentiment if they
> > ever opened this before.
> >
> > Would enabling the hybrid store reduce the surprise? If then maybe we
> could
> > ask users to enable it, with assigning a bit more memory (+ 2g on SHS
> > process) if they didn't use the hybrid store.
> >
> > 2025년 6월 6일 (금) 오후 5:08, Cheng Pan 님이 작성:
> >
> > > I think SHS only uses LevelDB/RocksDB to store intermediate data,
> > > supporting re-parsing to rebuild the cache should be fine enough.
> > >
> > > Also share my experience about using LevelDB/RocksDB for SHS, it seems
> > > LevelDB has native memory leak issues, at least for the SHS use case, I
> > > need to reboot the SHS for every two months to recover it, issue gone
> after
> > > upgrading to Spark 3.3 and switching to RocksDB.
> > >
> > > Scale and Performance: we keep ~800k applications event logs for the
> event
> > > log HDFS directory, multiple threads re-parsing to rebuild listing.rdb
> > > takes ~15mins.
> > >
> > > Thanks,
> > > Cheng Pan
> > >
> > >
> > >
> > > On Jun 6, 2025, at 15:36, Jungtaek Lim 
> > > wrote:
> > >
> > > IMHO, it's probably dependent on how long the rewrite will take, from
> > > reading the event log. If loading the state from LevelDB and rewriting
> to
> > > RocksDB is quite much faster, then we may want to support this for a
> couple
> > > minor releases to not force users to lose their cache. If there is no
> such
> > > difference, it is probably good to gradually migrate them
> automatically via
> > > opt-in for a couple minor releases. In both cases, we can enforce
> migration
> > > (neither opt-in nor opt-out) after that period.
> > >
> > > On Fri, Jun 6, 2025 at 10:51 AM Jia Fan  wrote:
> > >
> > >> This is indeed an issue at the moment. Personally, I haven't found a
> > >> proper way to migrate data from LevelDB to RocksDB, as their storage
> > >> structures are different. Should we wait until a reasonable migration
> > >> solution becomes available before moving forward with this?
> > >>
> > >> Jungtaek Lim  于2025年5月28日周三 15:41写道:
> > >> >
> > >> > Thanks for initiating this.
> > >> >
> > >> > I wonder if we don't have any compatibility issue on every
> component -
> > >> SS area does not have an issue, but I don't quite remember if the
> history
> > >> server would be OK with this. What is the story of the migration if
> they
> > >> had been using leveldb? I guess it could be probably re-parsed, but
> do we
> > >> need to ask users to perform some manual work to do that?
> > >> >
> > >> > On Wed, May 28, 2025 at 2:27 PM Yang Jie 
> wrote:
> > >> >>
> > >> >> The project "org.fusesource.leveldbjni:leveldbjni" released its
> last
> > >> version 12 years ago, and its code repository was last updated 8
> years ago.
> > >> Consequently, I believe it's challenging for us to receive ongoing
> > >> maintenance and support from this project.
> > >> >>
> > >> >> On the flip side, when developers implement new features related to
> > >> Spark code, they have become accustomed to using rocksdb instead of
> leveldb.
> > >> >>
> > >> >> Furthermore, in Spark 4.0, 

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-09 Thread Martin Grund
I updated the release based on the tag with the source releases and the
proper signature.

https://github.com/apache/spark-connect-go/releases/tag/v0.1.0-rc1

On Sun, Jun 8, 2025 at 10:44 PM Cheng Pan  wrote:

> The release artifacts don’t satisfy the ASF release policy[1].
>
> > Projects MUST direct outsiders towards official releases rather than raw
> source repositories, nightly builds, snapshots, release candidates, or any
> other similar packages.
>
> > Every ASF release MUST contain one or more source packages, which MUST
> be sufficient for a user to build and test the release provided they have
> access to the appropriate platform and tools. A source release SHOULD not
> contain compiled code.
>
> [1] https://www.apache.org/legal/release-policy.html#publication
>
> Thanks,
> Cheng Pan
>
>
>
> On Jun 9, 2025, at 12:21, Martin Grund 
> wrote:
>
> Please vote on releasing the following candidate as Apache Spark Connect
> Go Client 0.1.0.
>
> The release candidate was tested and built against Spark 4.0.0. The
> repository contains a sample application for submitting jobs written in Go
> using a small JVM wrapper
> 
> and quickstart
> 
> information.
>
> This vote is open for the next 72 hours and passes if a majority +1 PMC
> votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark Connect Go Client 0.1.0
> [] -1 Do not release this package because ...
>
> Tag: https://github.com/apache/spark-connect-go/tree/v0.1.0-rc1 (commit
> 2383413460105fbc665c7c36d7943d5f05a5b245)
>
> Thanks
> Martin
>
>
>


[VOTE] Release Spark 3.5.7 (RC1)

2025-06-09 Thread gurwls223
Please vote on releasing the following candidate as Apache Spark version 3.5.7.

The vote is open until Fri, 13 Jun 2025 06:32:20 PDT and passes if a majority 
+1 PMC votes are cast, with
a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.5.7
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see https://spark.apache.org/

The tag to be voted on is v3.5.7-rc1 (commit d5a625d9550):
https://github.com/apache/spark/tree/v3.5.7-rc1

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.5.7-rc1-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1498/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.5.7-rc1-docs/

The list of bug fixes going into 3.5.7 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12355975

FAQ

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC via "pip install 
https://dist.apache.org/repos/dist/dev/spark/v3.5.7-rc1-bin/pyspark-3.5.7.tar.gz";
and see if anything important breaks.
In the Java/Scala, you can add the staging repository to your project's 
resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with an out of date RC going forward).

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 3.5.7 (RC1)

2025-06-09 Thread Hyukjin Kwon
This is an automated vote. Please ignore it.

On Mon, Jun 9, 2025 at 6:46 AM  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 3.5.7.
>
> The vote is open until Fri, 13 Jun 2025 06:32:20 PDT and passes if a
> majority +1 PMC votes are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.5.7
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see https://spark.apache.org/
>
> The tag to be voted on is v3.5.7-rc1 (commit d5a625d9550):
> https://github.com/apache/spark/tree/v3.5.7-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.5.7-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1498/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.5.7-rc1-docs/
>
> The list of bug fixes going into 3.5.7 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12355975
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC via "pip install
> https://dist.apache.org/repos/dist/dev/spark/v3.5.7-rc1-bin/pyspark-3.5.7.tar.gz
> "
> and see if anything important breaks.
> In the Java/Scala, you can add the staging repository to your project's
> resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>