Re: [VOTE] Release Spark 3.5.2 (RC5)

2024-08-08 Thread Peter Toth
+1 huaxin gao ezt írta (időpont: 2024. aug. 8., Cs, 21:19): > +1 > > On Thu, Aug 8, 2024 at 11:41 AM L. C. Hsieh wrote: > >> Then, >> >> +1 again >> >> On Thu, Aug 8, 2024 at 11:38 AM Dongjoon Hyun >> wrote: >> > >> > +1 >> > >> > I'm resending my vote. >> > >> > Dongjoon. >> > >> > On 2024/08

Re: [DISCUSS] Using Github Issues for Spark-Connect-Go _only_ issues.

2024-08-08 Thread Kent Yao
+1 On 2024/08/08 23:24:32 Hyukjin Kwon wrote: > SGTM > > On Thu, 8 Aug 2024 at 14:53, Martin Grund > wrote: > > > Hi folks, > > > > I wanted to start a discussion for the following proposal: To make it > > easier for folks to contribute to the Spark Connect Go client, I was > > contemplating no

Re: Spark website repo size hits the storage limit of GitHub-hosted runners

2024-08-08 Thread Kent Yao
Hi Ruifeng, We have already ported those scripts Ruifeng Zheng 于2024年8月9日周五 12:41写道: > Hi Kent, I remember that we have some scripts to free disk in spark repo, > maybe we can reuse them for spark-website. > > On Fri, Aug 9, 2024 at 9:57 AM Sean Owen wrote: > >> I don't think that's the issue

Re: Spark website repo size hits the storage limit of GitHub-hosted runners

2024-08-08 Thread Ruifeng Zheng
Hi Kent, I remember that we have some scripts to free disk in spark repo, maybe we can reuse them for spark-website. On Fri, Aug 9, 2024 at 9:57 AM Sean Owen wrote: > I don't think that's the issue - it's the size of what is cloned into a > container during the GitHub actions runs. Doesnt matter

Re: [DISCUSS] Using Github Issues for Spark-Connect-Go _only_ issues.

2024-08-08 Thread Hyukjin Kwon
SGTM On Thu, 8 Aug 2024 at 14:53, Martin Grund wrote: > Hi folks, > > I wanted to start a discussion for the following proposal: To make it > easier for folks to contribute to the Spark Connect Go client, I was > contemplating not requiring them to deal with two accounts (one for Jira) > and one

Re: Spark website repo size hits the storage limit of GitHub-hosted runners

2024-08-08 Thread Sean Owen
I don't think that's the issue - it's the size of what is cloned into a container during the GitHub actions runs. Doesnt matter how it is stored. They are not large files, either. On Thu, Aug 8, 2024, 4:34 PM Mich Talebzadeh wrote: > > Maybe you should look into deploying GitHub Large File Stor

Re: Spark website repo size hits the storage limit of GitHub-hosted runners

2024-08-08 Thread Mich Talebzadeh
Maybe you should look into deploying GitHub Large File Storage (LFS). If applicable, store large documentation files in LFS to reduce the repository size. HTH Mich Talebzadeh, Ar

Re: [VOTE] Release Spark 3.5.2 (RC5)

2024-08-08 Thread huaxin gao
+1 On Thu, Aug 8, 2024 at 11:41 AM L. C. Hsieh wrote: > Then, > > +1 again > > On Thu, Aug 8, 2024 at 11:38 AM Dongjoon Hyun wrote: > > > > +1 > > > > I'm resending my vote. > > > > Dongjoon. > > > > On 2024/08/06 16:06:00 Kent Yao wrote: > > > Hi dev, > > > > > > Please vote on releasing the f

Re: Spark website repo size hits the storage limit of GitHub-hosted runners

2024-08-08 Thread Dongjoon Hyun
Ya, I agree that we need to investigate what happened at PySpark 3.5+ docs. For old Spark docs, it seems to be negligible. - All Spark 0.x docs: 231M - All Spark 1.x docs: 1.3G - All Spark 2.x docs: 3.4G For example, the total size of above all old Spark docs is less than the following 4 releas

Re: [VOTE] Release Spark 3.5.2 (RC5)

2024-08-08 Thread L. C. Hsieh
Then, +1 again On Thu, Aug 8, 2024 at 11:38 AM Dongjoon Hyun wrote: > > +1 > > I'm resending my vote. > > Dongjoon. > > On 2024/08/06 16:06:00 Kent Yao wrote: > > Hi dev, > > > > Please vote on releasing the following candidate as Apache Spark version > > 3.5.2. > > > > The vote is open until A

Re: [VOTE] Release Spark 3.5.2 (RC5)

2024-08-08 Thread Dongjoon Hyun
+1 I'm resending my vote. Dongjoon. On 2024/08/06 16:06:00 Kent Yao wrote: > Hi dev, > > Please vote on releasing the following candidate as Apache Spark version > 3.5.2. > > The vote is open until Aug 9, 17:00:00 UTC, and passes if a majority +1 > PMC votes are cast, with a minimum of 3 +1 v

Re: Spark website repo size hits the storage limit of GitHub-hosted runners

2024-08-08 Thread Sean Owen
That seems a ltle bit too much to me. I could see people still on a recent version that just want to see docs or compare/contrast docs for changes. Removing the versions that seem to have ~0 traffic would remove, it seems, like 80% of the .html files (and replace them with a compressed archive

Re: Spark website repo size hits the storage limit of GitHub-hosted runners

2024-08-08 Thread Sean Owen
Whoa! Is there any clear reason why 3.5 docs are so big? 1GB of docs / 10x jump seems crazy. Maybe we need to investigate and fix that also. I take it that the problem is the size of the repo once it's cloned into the docker container. Removing the .html files helps that, but, then we don't have .

Re: [VOTE] Release Spark 3.5.2 (RC5)

2024-08-08 Thread Dongjoon Hyun
Hi, Kent and all. It seems that the vote replies are not archived in the mailing list for some reasons. https://lists.apache.org/list.html?dev@spark.apache.org https://lists.apache.org/thread/chos58kswjg3x9cotp5rn0oc7hnfc6o4 Dongjoon/ On Wed, Aug 7, 2024 at 1:44 PM John Zhuge wrote: > +1 (no

Re: Spark website repo size hits the storage limit of GitHub-hosted runners

2024-08-08 Thread Dongjoon Hyun
The culprit seems to be PySpark 3.5 documentation which grows 11x times at 3.5+ $ du -h 3.4.3/api/python | tail -n1 84M 3.4.3/api/python $ du -h 3.5.1/api/python | tail -n1 943M 3.5.1/api/python Since we will generate big documents for 3.5.x, 4.0.0-preview, 4.0.x, 4.1.x, the proposed tarball id

Re: [DISCUSS] Using Github Issues for Spark-Connect-Go _only_ issues.

2024-08-08 Thread Mich Talebzadeh
Hi Martin, Overall, your proposal seems to align well with improving the contributor experience and managing issues more effectively for the Spark Connect Go client. As long as there is a plan to handle potential integration challenges and clear communication with the community, this approach coul

Re: [DISCUSS] Using Github Issues for Spark-Connect-Go _only_ issues.

2024-08-08 Thread Reynold Xin
I'd love that too. But maybe we can start small and try it out with one project ... On Thu, Aug 8, 2024 at 7:16 AM Sean Owen wrote: > Oh nice if that has changed. Id personally prefer switching all of Spark > to GitHub issues for simplicity but maybe that's a big lift. And a separate > question.

Re: Spark website repo size hits the storage limit of GitHub-hosted runners

2024-08-08 Thread Wenchen Fan
It makes sense to me to only keep the doc files for the latest maintenance release. i.e. remove the docs for 3.5.0 and only keep 3.5.1. On Thu, Aug 8, 2024 at 8:06 PM Kent Yao wrote: > Hi dev, > > The current size of the spark-website repository is approximately 16GB, > exceeding the storage lim

Re: Spark website repo size hits the storage limit of GitHub-hosted runners

2024-08-08 Thread Nicholas Chammas
How big of a change would it be to have the repo only contain the Markdown source and not the rendered HTML (which should perhaps be moved to an object store)? > On Aug 8, 2024, at 8:06 AM, Kent Yao wrote: > > Hi dev, > > The current size of the spark-website repository is approximately 16GB

Re: [DISCUSS] Using Github Issues for Spark-Connect-Go _only_ issues.

2024-08-08 Thread Sean Owen
Oh nice if that has changed. Id personally prefer switching all of Spark to GitHub issues for simplicity but maybe that's a big lift. And a separate question. On Thu, Aug 8, 2024, 9:12 AM Martin Grund wrote: > Mich, yes, the goal is to make it easier for folks to contribute to the Go > client, a

Re: [DISCUSS] Using Github Issues for Spark-Connect-Go _only_ issues.

2024-08-08 Thread Martin Grund
Mich, yes, the goal is to make it easier for folks to contribute to the Go client, and my discussion is related to the https://github.com/apache/spark-connect-go repository only and thanks a lot for the feedback. My assumption is that we will monitor the GH issues in the same way as we do for the J

Re: Spark website repo size hits the storage limit of GitHub-hosted runners

2024-08-08 Thread Sean Owen
I agree with 'archiving', but what does that mean? delete from the repo and site? While I really doubt people are looking for docs for, say, 0.5.0, it'd be a big jump to totally remove it. What if we made a compressed tarball of old docs and put that in the repo, linked to it, and removed the docs

Re: [DISCUSS] Using Github Issues for Spark-Connect-Go _only_ issues.

2024-08-08 Thread Sean Owen
This is still part of the Apache Spark project, conceptually? IIRC Apache projects still need to use JIRA, so we can't do this. On Thu, Aug 8, 2024 at 5:08 AM Mich Talebzadeh wrote: > Hi Martin, > > If I understood it correctly, your proposal suggests centralizing issue > tracking for the Spark

Spark website repo size hits the storage limit of GitHub-hosted runners

2024-08-08 Thread Kent Yao
Hi dev, The current size of the spark-website repository is approximately 16GB, exceeding the storage limit of GitHub-hosted runners. The GitHub actions have been failing recently in the actions/checkout step caused by 'No space left on device' errors. Filesystem Size Used Avail Use% Mount

Re: [DISCUSS] Using Github Issues for Spark-Connect-Go _only_ issues.

2024-08-08 Thread Mich Talebzadeh
Hi Martin, If I understood it correctly, your proposal suggests centralizing issue tracking for the Spark Connect Go client on GitHub Issues, instead of using both Jira and GitHub.? The primary motivation is to simplify the contribution process for developers? Few points if I may: - How wil