Re: [VOTE] Release Spark 4.0.0 (RC6)

2025-05-14 Thread Holden Karau
+1 On Wed, May 14, 2025 at 10:10 AM L. C. Hsieh wrote: > +1 > > On Tue, May 13, 2025 at 3:28 PM Wenchen Fan wrote: > > > > Please vote on releasing the following candidate as Apache Spark version > 4.0.0. > > > > The vote is open until May 16 (PST) and passes if a majority +1 PMC > votes are ca

Re: [VOTE] SPIP: Declarative Pipelines

2025-04-09 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Fight Health Insurance: https://www.fighthealthinsurance.com/ Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.yo

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-25 Thread Holden Karau
So I think if I understand folks concerns it’s that we’ve let it slide in the past and at some point we’ve got to stop letting it slide because there is some concern we might not be meeting the ASF guidance here. Personally I think given they’re test artifacts and how delayed Spark 4 is we should

Re: setuptools 78.0.0 does not work with pyspark 3.x releases

2025-03-24 Thread Holden Karau
I think given the lack of 4.0 release and the amount of folks using PySpark this is enough to trigger a 3.5 branch release. Twitter: https://twitter.com/holdenkarau Fight Health Insurance: https://www.fighthealthinsurance.com/ Books (Learning Spar

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Holden Karau
llow his proposal. The major argument here is, he can just VETO to any >>> proposal to retain the codebase as the way he prefers to, which I don't >>> believe is a correct usage of VETO. >>> >>> If we just revert the change of removal of config, this is "

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Holden Karau
aches, but exposing your "technically valid" concern of my > approach, backed by Dongjoon's veto (most likely you want to quote > Dongjoon's post). This is very simple and I'm not sure you are doing > exactly what the VOTE requires. > > On Mon, Mar 17, 2025 at 6:32

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Holden Karau
n follow. Especially, if you did not tell the > truth from 3, e.g. you let me go and discuss while you were intended to > block me in any phase, this is a strong indication that you intend to play > with me and the community (or even ASF) has to know that. > > Do not evade the root qu

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Holden Karau
Sat, Mar 15, 2025 at 5:07 PM Holden Karau wrote: > Given it’s the weekend maybe let’s give folks at least one full work day. > > Twitter: https://twitter.com/holdenkarau > Fight Health Insurance: https://www.fighthealthinsurance.com/ > <https://www.fighthealthinsurance.com/?q=hk_

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-15 Thread Holden Karau
Given it’s the weekend maybe let’s give folks at least one full work day. Twitter: https://twitter.com/holdenkarau Fight Health Insurance: https://www.fighthealthinsurance.com/ Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/

Re: [VOTE][RESULT] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

2025-03-15 Thread Holden Karau
My $0.02: I do not believe that this vote has passed. I believe there is a valid veto. On a personal level from a migration point of view I think Spark 4 is the perfect time to drop this configuration. Given the disagreement of if this is a valid veto I think we should pause until the board has b

Re: Seek for consensus on landing Spark Connect implementation for transformWithState in Spark 4.0.0

2025-03-04 Thread Holden Karau
I share the same concern, adding new features at this stage feels risky and likely to drag out an already fairly late release. Twitter: https://twitter.com/holdenkarau Fight Health Insurance: https://www.fighthealthinsurance.com/ Books (Learning S

Re: Deprecating and banning `spark.databricks.*` config from Apache Spark repository

2025-02-18 Thread Holden Karau
I think that removing in 4 sounds reasonable to me as well. It’s important to create a sense of fairness among vendors. Twitter: https://twitter.com/holdenkarau Fight Health Insurance: https://www.fighthealthinsurance.com/ Books (Learning Spark, H

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Holden Karau
Interesting. So given one of the features of Spark connect should be simpler migrations we should (in my mind) only declare it stable once we’ve gone through two releases where the previous client + its code can talk to the new server. Twitter: https://twitter.com/holdenkarau Fight Health Insuranc

Re: Re: Increasing Shading & Relocating for 4.0

2025-01-18 Thread Holden Karau
a potential solution here? > > > On Fri, Jan 17, 2025 at 12:27 Holden Karau wrote: > >> +1 I think this is great. If you’ve got any shading you’d be open to >> upstreaming I’d be happy to review it. >> >> Twitter: https://twitter.com/holdenkarau

Re: Re: Increasing Shading & Relocating for 4.0

2025-01-17 Thread Holden Karau
h-d-5205b2/> >> > >> > >> > https://en.everybodywiki.com/Mich_Talebzadeh >> > >> > >> > >> > *Disclaimer:* The information provided is correct to the best of my >> > knowledge but of course cannot be guaranteed . It is essential to not

Re: [DISCUSS] Ongoing projects for Spark 4.0

2025-01-15 Thread Holden Karau
I think I’ll need to go in and mark GraphX as deprecated. But that can happen post branch cut. Twitter: https://twitter.com/holdenkarau Fight Health Insurance: https://www.fighthealthinsurance.com/ Books (Learning Spark, High Performance Spark, et

Re: [VOTE] Use plain text logs by default

2025-01-12 Thread Holden Karau
+1 On Sun, Jan 12, 2025 at 9:58 PM Xiao Li wrote: > +1 > > Dongjoon Hyun 于2025年1月11日周六 12:47写道: > >> +1 >> >> Dongjoon >> >> On 2025/01/10 00:57:25 Ye Zhou wrote: >> > +1 (non-binding) >> > >> > On Thu, Jan 9, 2025 at 3:02 PM Sakthi wrote: >> > >> > > +1 (non-binding) >> > > >> > > Sakthi >> >

Re: [DISCUSS] Pythonic approach of setting Spark SQL configurations

2024-12-26 Thread Holden Karau
I think having automatic gettr/settr on spark.conf object seems reasonable to me. On Thu, Dec 26, 2024 at 9:32 PM Reynold Xin wrote: > I actually think this might be confusing (just in general adding too many > different ways to do the same thing is also un-Pythonic). > > On Thu, Dec 26, 2024 at

Re: [外部邮件] Re: Spark Connect the default API in Spark 4.0

2024-12-14 Thread Holden Karau
;>> because it seems too early to make this decision >>> given the pending `Spark Connect` work and active discussion. >>> Previously, I was biased only on the SQL part too much. >>> >>> As a side note, I hope Apache Spark 4.0.0 release is not

Increasing Shading & Relocating for 4.0

2024-12-06 Thread Holden Karau
Hi Y'all, As we're getting closer to 4.0 I was thinking now is a good time for us to try and reduce the class path we expose for JVM users. Are there any common classes/packages folks would like to see shaded? Cheers, Holden :) -- Twitter: https://twitter.com/holdenkarau Fight Health Insurance

Re: [外部邮件] Re: Spark Connect the default API in Spark 4.0

2024-12-03 Thread Holden Karau
> > Similarly based on the experience of running Spark Connect in production, > the co-existence of workloads running in classic mode and connect mode is > working fine. > > I still don’t like classic mode (maybe “full” and “restricted”). > > > On Fri, Nov 29, 2024 at 3:

Re: [外部邮件] Re: Spark Connect the default API in Spark 4.0

2024-11-28 Thread Holden Karau
, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau Pronouns: she/her On Thu, Nov 28, 2024 at 6:11 PM Holden Karau wrote: > Given there is no plan to support RDDs I’ll update to -0.9 >

Re: [外部邮件] Re: Spark Connect the default API in Spark 4.0

2024-11-28 Thread Holden Karau
ersion as >> well - once there is api parity. >> >> Irrespective of which major/minor version we make the switch in - there >> could be user impact; minimizing this impact would be greatly appreciated >> by our users. >> >> Regards, >> Mridul &g

Re: [外部邮件] Re: Spark Connect the default API in Spark 4.0

2024-11-27 Thread Holden Karau
-0.5: I don’t think this a good idea for JVM apps until we have API parity. (Binding but to be clear not a veto) Twitter: https://twitter.com/holdenkarau Fight Health Insurance: https://www.fighthealthinsurance.com/ Books (Learning Spark, High Per

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-11-13 Thread Holden Karau
Responses in line Twitter: https://twitter.com/holdenkarau Fight Health Insurance: https://www.fighthealthinsurance.com/ Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams:

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-11-12 Thread Holden Karau
So it’s deprecated but I will review some basic graph X PRs as I would like us to bring graph X back to life — but under our current release structure we need to deprecate now if we want to be able to remove it in the next few years. Twitter: https://twitter.com/holdenkarau Fight Health Insurance:

Re: Shuffle TTLs

2024-10-16 Thread Holden Karau
> Thanks for bringing this up. Wouldn't it be better for the notebooks to > control when these DFs/RDDs expire so they can do fine granular control? > > On Wed, Oct 16, 2024 at 7:51 AM Holden Karau > wrote: > >> Hi Spark Devs, >> >> So back in Spark 1.X we ha

Re: Apache Spark 3.4.4 EOL Release?

2024-10-16 Thread Holden Karau
+1 on a 3.4.4 EOL release On Wed, Oct 16, 2024 at 9:37 AM Dongjoon Hyun wrote: > Hi, All. > > Since the Apache Spark 3.4.0 RC7 vote passed on Apr 6, 2023, branch-3.4 > has been maintained and served well until now. > > - https://github.com/apache/spark/releases/tag/v3.4.0 (tagged on Apr 6, > 202

Shuffle TTLs

2024-10-16 Thread Holden Karau
Hi Spark Devs, So back in Spark 1.X we had shuffle TTLs, but they did not take into account last access times. With the increased use of notebooks where dataframes & rdds are more likely to be defined at the global scope I was thinking it could be a good time to try and re-introduce shuffle TTLs b

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-07 Thread Holden Karau
just blog all about the motif matching in GraphFrames: > > https://blog.graphlet.ai/financial-crime-and-corruption-network-motifs-4cf2e8e10eb5 > > Russ > > On Mon, Oct 7, 2024 at 5:38 PM Holden Karau > wrote: > >> So this discuss thread and the vote thread to deprecat

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-07 Thread Holden Karau
way to raise visibility here? > > On Mon, Oct 7, 2024 at 4:24 PM Holden Karau > wrote: > >> There are no specific tickets associated with the lack of maintaince or >> this as the component has not been maintained for a sufficiently long time. >> If your interested in

Re: [VOTE] Officialy Deprecate GraphX in Spark 4

2024-10-07 Thread Holden Karau
https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau Pronouns: she/her On Fri, Oct 4, 2024 at 2:57 PM Mark Hamstra wrote: > -1(*) reasoning posted in the DISCUSS thread > > On Mon, Sep 30, 2024 at 12:40 PM Holden Kara

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-07 Thread Holden Karau
4PR7PeJ4MUBOS8bbD7CNssUIMqRMvY_pOqbh7PfLY0lRpQh9mfqBC0KnSHBZzxxSJJr-55r5kv6YjYwrA,,&typo=1> >> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >> >> Pronouns: she/her >> >> >> >> >> >> On Sat, Oct 5, 2024 at 9:17 PM Ángel &g

Re: [VOTE] Officialy Deprecate GraphX in Spark 4

2024-10-07 Thread Holden Karau
reams: https://www.youtube.com/user/holdenkarau Pronouns: she/her On Mon, Sep 30, 2024 at 11:01 AM Holden Karau wrote: > I think it has been de-facto deprecated, we haven’t updated it > meaningfully in several years. I think removing the API would be excessive > but deprecating it would give

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-05 Thread Holden Karau
their code. > > I think that’s probably the way to go. > > El dom, 6 oct 2024 a las 6:09, Holden Karau () > escribió: > >> So removing GraphX from Spark would not prevent GraphFrames from >> continuing, they could pick up the GraphX source and incorporate it into >

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-05 Thread Holden Karau
; >> > >>>> >> > On Fri, Oct 4, 2024 at 4:56 PM Mark Hamstra >>>> wrote: >>>> >> >> >>>> >> >> I'm -1(*) because, while it technically means "might be removed >>>> in the >>>> >> >> future&

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-04 Thread Holden Karau
GraphX as deprecated. > > On Fri, Oct 4, 2024 at 3:00 PM Holden Karau > wrote: > > > > Personally I think people should not depend on it — there’s literally no > one working on it, and not being up front about that I think draws > everything else into question. > > &g

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-04 Thread Holden Karau
of Spark's graph processing capabilities" -- i.e., > things that encourage and support new contributions to address any > shortcomings in Spark's graph processing, not things that discourage > contributions and use in the way that I believe simply declaring > GraphX to be de

[VOTE] Officialy Deprecate GraphX in Spark 4

2024-09-30 Thread Holden Karau
I think it has been de-facto deprecated, we haven’t updated it meaningfully in several years. I think removing the API would be excessive but deprecating it would give us the flexibility to remove it in the not too distant future. That being said this is not a vote to remove GraphX, I think that w

[DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-09-29 Thread Holden Karau
Since we're getting close to cutting a 4.0 branch I'd like to float the idea of officially deprecating Graph X. What that would mean (to me) is we would update the docs to indicate that Graph X is deprecated and it's APIs may be removed at anytime in the future. Alternatively, we could mark it as

Re: [VOTE] Release Spark 4.0.0-preview2 (RC1)

2024-09-16 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau Pronouns: she/her On Mon, Sep 16, 2024 at 10:55 AM Zhou Jiang wrote: > + 1 > Sent

Re: [VOTE] Document and Feature Preview via GitHub Pages

2024-09-11 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau Pronouns: she/her On Wed, Sep 11, 2024 at 6:45 PM Xiao Li wrote: > +1 > > Hyukjin

Re: [VOTE] Deprecate SparkR

2024-08-21 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau Pronouns: she/her On Wed, Aug 21, 2024 at 8:59 PM Herman van Hovell wrote: > +1 >

Re: [DISCUSS] Deprecating SparkR

2024-08-12 Thread Holden Karau
+1 Are the sparklyr folks on this list? Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau Pronouns: she/her On Mon, Aug 12, 2024 at 5:22

Re: [VOTE] Archive Spark Documentations in Apache Archives

2024-08-12 Thread Holden Karau
+1 On Mon, Aug 12, 2024 at 10:17 AM Dongjoon Hyun wrote: > +1 for the proposals > - enhancing the release process to put the docs to `release` directory in > order to archive. > - uploading old releases via SVN manually to archive. > > Since deletion is not a scope of this vote, I don't see any

Re: [VOTE] Using Github Issues for Spark-Connect-Go _only_ issues.

2024-08-12 Thread Holden Karau
+0 (binding) Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau Pronouns: she/her On Mon, Aug 12, 2024 at 9:14 AM Matthew Powers wrote:

Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

2024-07-24 Thread Holden Karau
sou...@gmail.com> wrote: >>> > >> >>> > >>> I'd propose not to change the name of "Spark Connect" - the name >>> > >>> represents the characteristic of the mode (separation of layer for >>> client >>> > &

Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

2024-07-20 Thread Holden Karau
I think perhaps Spark Connect could be phrased as “Basic* Spark” & existing Spark could be “Full Spark” given the API limitations of Spark connect. *I was also thinking Core here but we’ve used core to refer to the RDD APIs for too long to reuse it here. Twitter: https://twitter.com/holdenkarau B

Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-13 Thread Holden Karau
rted, and opened a new one https://github.com/apache/spark/pull/47341. > > On Sat, 13 Jul 2024 at 15:40, Hyukjin Kwon wrote: > >> Yeah that's fine. I'll revert and open a fresh PR including my own >> followup when I get back home later today. >> >> On

Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-12 Thread Holden Karau
tead of a plan string ser/de. We made similar changes > in JSON and CSV schema inference (it was an RDD before) > > On Sat, Jul 13, 2024 at 10:33 AM Holden Karau > wrote: > >> My bad I meant to say I believe the provided justification is >> inappropriate. >> >>

Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-12 Thread Holden Karau
On Fri, Jul 12, 2024 at 5:14 PM Holden Karau wrote: > So looking at the PR it does not appear to be removing any RDD APIs but > the justification provided for changing the ML backend to use the DataFrame > APIs is indeed concerning. > > This PR appears to have been merged with

Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-12 Thread Holden Karau
So looking at the PR it does not appear to be removing any RDD APIs but the justification provided for changing the ML backend to use the DataFrame APIs is indeed concerning. This PR appears to have been merged without proper review (or providing an opportunity for review). I’d like to remind peo

Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-04 Thread Holden Karau
+1 Although given its a US holiday maybe keep the vote open for an extra day? Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Thu,

Re: [外部邮件] Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-02 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Tue, Jul 2, 2024 at 10:18 PM yangjie01 wrote: > +1 (non-binding) > > > > *发件人**

Re: [DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-02 Thread Holden Karau
I guess my one concern here would be are we going to expand the dependencies that are visible on the class path for non-connect users? One of the pain points that folks experienced with upgrading can be from those changing. Otherwise this seems pretty reasonable. Twitter: https://twitter.com/hol

Re: [DISCUSS] Spark 4.0.0 release

2024-05-08 Thread Holden Karau
. >> Is there some point of contact that can provide me needed context and >> permissions? >> I'd also love to see why the costs are high and see how we can reduce >> them... >> >> Thanks, >> Nimrod >> >> On Wed, May 8, 2024 at 8:26 AM Hol

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Holden Karau
t; will be automated and the only thing which will be manual is to sign the > release for security reasons that would be reasonable. > > Thanks, > Nimrod > > > בתאריך יום ד׳, 8 במאי 2024, 00:54, מאת Holden Karau ‏< > holden.ka...@gmail.com>: > >> Indeed. We could c

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Holden Karau
anymore, my pgp >> key is lost, etc.). I'll start the RC process at my tomorrow. Thanks for >> your patience! >> >> Wenchen >> >> On Fri, May 3, 2024 at 7:47 AM yangjie01 wrote: >> >>> +1 >>> >>> >>> >>> *发

Re: ASF board report draft for May

2024-05-06 Thread Holden Karau
release) >>>> >>>> In addition, Apache Spark PMC received an official notice from ASF >>>> Infra team. >>>> >>>> https://lists.apache.org/thread/rgy1cg17tkd3yox7qfq87ht12sqclkbg >>>> > [NOTICE] Apache Spark's Gi

Re: ASF board report draft for May

2024-05-06 Thread Holden Karau
as much as possible, we >> opened a blocker-level JIRA issue and have been working on it. >> - https://infra.apache.org/github-actions-policy.html >> >> Please include a sentence that Apache Spark PMC is working on under the >> following umbrella JIRA issue. >&g

Re: ASF board report draft for May

2024-05-05 Thread Holden Karau
Do we want to include that we’re planning on having a preview release of Spark 4 so folks can see the APIs “soon”? Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.you

Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Holden Karau
+1 :) yay previews On Wed, May 1, 2024 at 5:36 PM Chao Sun wrote: > +1 > > On Wed, May 1, 2024 at 5:23 PM Xiao Li wrote: > >> +1 for next Monday. >> >> We can do more previews when the other features are ready for preview. >> >> Tathagata Das 于2024年5月1日周三 08:46写道: >> >>> Next week sounds great

Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-26 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Fri, Apr 26, 2024 at 12:06 PM L. C. Hsieh wrote: > +1 > > On Fri, Apr 26, 2024

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Thu, Apr 25, 2024 at 11:18 AM Maciej wrote: > +1 > > Best regards, > Maciej Szy

Re: [VOTE] SPARK-44444: Use ANSI SQL mode by default

2024-04-13 Thread Holden Karau
+1 -- even if it's not perfect now is the time to change default values On Sat, Apr 13, 2024 at 4:11 PM Hyukjin Kwon wrote: > +1 > > On Sun, Apr 14, 2024 at 7:46 AM Chao Sun wrote: > >> +1. >> >> This feature is very helpful for guarding against correctness issues, >> such as null results due t

Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-10 Thread Holden Karau
On Wed, Apr 10, 2024 at 9:54 PM Binwei Yang wrote: > > Gluten currently already support Velox backend and Clickhouse backend. > data fusion support is also proposed but no one worked on it. > > Gluten isn't a POC. It's under actively developing but some companies > already used it. > > > On 2024/

Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-09 Thread Holden Karau
I like the idea of improving flexibility of Sparks physical plans and really anything that might reduce code duplication among the ~4 or so different accelerators. Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9

Re: Apache Spark 3.4.3 (?)

2024-04-06 Thread Holden Karau
Sounds good to me :) Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Sat, Apr 6, 2024 at 2:51 PM Dongjoon Hyun wrote: > Hi, All. >

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-01 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Mon, Apr 1, 2024 at 5:44 PM Xinrong Meng wrote: > +1 > > Thank you @Hyukjin Kwo

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-12 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Mon, Mar 11, 2024 at 7:44 PM Reynold Xin wrote: > +1 > > > On Mon, Mar 11 2024

Re: Generating config docs automatically

2024-02-21 Thread Holden Karau
I think this is a good idea. I like having everything in one source of truth rather than two (so option 1 sounds like a good idea); but that’s just my opinion. I'd be happy to help with reviews though. On Wed, Feb 21, 2024 at 6:37 AM Nicholas Chammas wrote: > I know config documentation is not t

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-13 Thread Holden Karau
This looks really cool :) Out of interest what are the differences in the approach between this and Glutten? On Tue, Feb 13, 2024 at 12:42 PM Chao Sun wrote: > Hi all, > > We are very happy to announce that Project Comet, a plugin to > accelerate Spark query execution via leveraging DataFusion a

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-01-16 Thread Holden Karau
Oh interesting solution, a co-worker was suggesting something similar using resource profiles to increase memory -- but your approach avoids a lot of complexity I like it (and we could extend it out to support resource profile growth too). I think an SPIP sounds like a great next step. On Tue, Ja

Re: [VOTE] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-14 Thread Holden Karau
+1 On Tue, Nov 14, 2023 at 10:21 AM DB Tsai wrote: > +1 > > DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > > On Nov 14, 2023, at 10:14 AM, Vakaris Baškirov < > vakaris.bashki...@gmail.com> wrote: > > +1 (non-binding) > > > On Tue, Nov 14, 2023 at 8:03 PM Chao Sun wrote: > >> +1

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-12 Thread Holden Karau
To be clear: I am generally supportive of the idea (+1) but have some follow-up questions: Have we taken the time to learn from the other operators? Do we have a compatible CRD/API or not (and if so why?) The API seems to assume that everything is packaged in the container in advance, but I imagin

Re: Apache Spark 3.4.2 (?)

2023-11-06 Thread Holden Karau
+1 On Mon, Nov 6, 2023 at 4:30 PM yangjie01 wrote: > +1 > > > > *发件人**: *Yuming Wang > *日期**: *2023年11月7日 星期二 07:00 > *收件人**: *Santosh Pingale > *抄送**: *Dongjoon Hyun , dev > > *主题**: *Re: Apache Spark 3.4.2 (?) > > > > +1 > > > > On Tue, Nov 7, 2023 at 3:55 AM Santosh Pingale > wrote: > > M

Re: Write Spark Connection client application in Go

2023-09-12 Thread Holden Karau
That’s so cool! Great work y’all :) On Tue, Sep 12, 2023 at 8:14 PM bo yang wrote: > Hi Spark Friends, > > Anyone interested in using Golang to write Spark application? We created a > Spark > Connect Go Client library . > Would love to hear feedback/t

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-07 Thread Holden Karau
+1 pip installing seems to function :) On Thu, Sep 7, 2023 at 7:22 PM Yuming Wang wrote: > +1. > > On Thu, Sep 7, 2023 at 10:33 PM yangjie01 > wrote: > >> +1 >> >> >> >> *发件人**: *Gengliang Wang >> *日期**: *2023年9月7日 星期四 12:53 >> *收件人**: *Yuanjian Li >> *抄送**: *Xiao Li , "her...@databricks.com.

Re: [VOTE] Release Apache Spark 3.5.0 (RC3)

2023-09-02 Thread Holden Karau
Can we delay the next RC cut until after Labor Day? On Sat, Sep 2, 2023 at 9:59 PM Yuanjian Li wrote: > Thank you for all the reports! > The vote has failed. I plan to cut RC4 in two days. > > @Dipayan Dev I quickly skimmed through the > corresponding ticket, and it doesn't seem to be a regress

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-23 Thread Holden Karau
g Lead >>> London >>> United Kingdom >>> >>> >>>view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> https://en.everybodywiki.com/Mich_Talebzadeh >>> >>

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-08 Thread Holden Karau
2023 at 23:42, Mich Talebzadeh >>>> wrote: >>>> >>>> >>>> >>>> Hi, >>>> >>>> >>>> >>>> From what I have seen spark on a serverless cluster has hard up getting >>>> the dr

Re: ASF board report draft for August 2023

2023-08-08 Thread Holden Karau
Maybe add a link to the 4.0 JIRA where we are tracking the current plans for 4.0? On Tue, Aug 8, 2023 at 9:33 AM Dongjoon Hyun wrote: > Thank you, Matei. > > It looks good to me. > > Dongjoon > > On Mon, Aug 7, 2023 at 22:54 Matei Zaharia > wrote: > >> It’s time to send our quarterly report to

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-07 Thread Holden Karau
Oooh fascinating. I’m going on call this week so it will take me awhile but I do want to review this :) On Mon, Aug 7, 2023 at 5:30 PM Pavan Kotikalapudi wrote: > Hi Spark Dev, > > I have extended traditional DRA to work for structured streaming > use-case. > > Here is an initial Implementation

Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-07 Thread Holden Karau
Oh great point On Mon, Aug 7, 2023 at 2:23 PM bo yang wrote: > Thanks Holden for bringing this up! > > Maybe another thing to think about is how to make dynamic allocation more > friendly with Kubernetes and disaggregated shuffle storage? > > > > On Mon, Aug 7, 2023

Improving Dynamic Allocation Logic for Spark 4+

2023-08-07 Thread Holden Karau
So I wondering if there is interesting in revisiting some of how Spark is doing it's dynamica allocation for Spark 4+? Some things that I've been thinking about: - Advisory user input (e.g. a way to say after X is done I know I need Y where Y might be a bunch of GPU machines) - Configurable toler

Re: [VOTE][SPIP] Python Data Source API

2023-07-07 Thread Holden Karau
+1 On Fri, Jul 7, 2023 at 9:55 AM huaxin gao wrote: > +1 > > On Fri, Jul 7, 2023 at 8:59 AM Mich Talebzadeh > wrote: > >> +1 for me >> >> Mich Talebzadeh, >> Solutions Architect/Engineering Lead >> Palantir Technologies Limited >> London >> United Kingdom >> >> >>view my Linkedin profile >>

Re: [VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Holden Karau
Wed, Jun 21, 2023 at 8:30 AM Reynold Xin wrote: > +1 > > This is a great idea. > > > On Wed, Jun 21, 2023 at 8:29 AM, Holden Karau > wrote: > >> I’d like to start with a +1, better Python testing tools integrated into >> the project make sense. >> >

Re: [VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Holden Karau
I’d like to start with a +1, better Python testing tools integrated into the project make sense. On Wed, Jun 21, 2023 at 8:11 AM Amanda Liu wrote: > Hi all, > > I'd like to start the vote for SPIP: PySpark Test Framework. > > The high-level summary for the SPIP is that it proposes an official te

Re: [VOTE][RESULT] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-20 Thread Holden Karau
ut if it only entails changing to >>> Scala 2.13 and dropping support for JDK 8, then we could also just release >>> a month after 3.5. >>> >>> How about we do this? We get 3.5 released, and afterwards we do a couple >>> of meetings where we build this road

Re: Gauging interest in: ScalaFix + Scala Steward for Spark 4.0

2023-06-12 Thread Holden Karau
Yup I think buidling consensus on what goes in 4.X is something we’ll need to do. On Mon, Jun 12, 2023 at 11:56 AM Dongjoon Hyun wrote: > Thank you for sharing those. I'm also interested in taking advantage of > it. Also, I hope `spark-upgrade` can help us in line with Spark 4.0. > > However, we

Gauging interest in: ScalaFix + Scala Steward for Spark 4.0

2023-06-12 Thread Holden Karau
My self and a few folks have been working on a spark-upgrade project (focused on getting folks onto current versions of Spark). Since it looks like were starting the discussion around Spark 4 I was thinking now could be a good time for us to consider if we want to try and integrate auto-upgrade rul

Re: [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-12 Thread Holden Karau
-0 I'd like to see more of a doc around what we're planning on for a 4.0 before we pick a target release date etc. (feels like cart before the horse). But it's a weak preference. On Mon, Jun 12, 2023 at 11:24 AM Xiao Li wrote: > Thanks for starting the vote. > > I do have a concern about the t

Re: JDK version support policy?

2023-06-07 Thread Holden Karau
So JDK 11 is still supported in open JDK until 2026, I'm not sure if we're going to see enough folks moving to JRE17 by the Spark 4 release unless we have a strong benefit from dropping 11 support I'd be inclined to keep it. On Tue, Jun 6, 2023 at 9:08 PM Dongjoon Hyun wrote: > I'm also +1 on dr

Re: ASF policy violation and Scala version issues

2023-06-06 Thread Holden Karau
So I think if the Spark PMC wants to ask Databricks something that could be reasonable (although I'm a little fuzzy as to the ask), but that conversation might belong on private@ (I could be wrong of course). On Tue, Jun 6, 2023 at 3:29 AM Mich Talebzadeh wrote: > I concur with you Sean. > > If

Re: Slack for Spark Community: Merging various threads

2023-04-07 Thread Holden Karau
I think there was some concern around how to make any sync channel show up in logs / index / search results? On Fri, Apr 7, 2023 at 9:41 AM Dongjoon Hyun wrote: > Thank you, All. > > I'm very satisfied with the focused and right questions for the real > issues by removing irrelevant claims. :) >

Re: Apache Spark 3.2.4 EOL Release?

2023-04-04 Thread Holden Karau
+1 On Tue, Apr 4, 2023 at 11:04 AM L. C. Hsieh wrote: > +1 > > Sounds good and thanks Dongjoon for driving this. > > On 2023/04/04 17:24:54 Dongjoon Hyun wrote: > > Hi, All. > > > > Since Apache Spark 3.2.0 passed RC7 vote on October 12, 2021, branch-3.2 > > has been maintained and served well u

Re: Ammonite as REPL for Spark Connect

2023-03-22 Thread Holden Karau
I am +1 to the general concept of including Ammonite magic 🪄. On Wed, Mar 22, 2023 at 4:58 PM Herman van Hovell wrote: > Ammonite is maintained externally by Li Haoyi et al. We are including it > as a 'provided' dependency. The integration bits and pieces (1 file) are > included in Apache Spark.

Re: SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-18 Thread Holden Karau
Is there someone focused on streaming work these days who would want to shepherd this? On Sat, Feb 18, 2023 at 5:02 PM Dongjoon Hyun wrote: > Thank you for considering me, but may I ask what makes you think to put me > there, Mich? I'm curious about your reason. > > > I have put dongjoon.hyun as

Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-13 Thread Holden Karau
That’s legit, if the patch author isn’t comfortable with a backport then let’s leave it be 👍 On Mon, Feb 13, 2023 at 9:59 AM Dongjoon Hyun wrote: > Hi, All. > > As the author of that `Improvement` patch, I strongly disagree with giving > the wrong idea which Python 3.11 is officially supported i

Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-13 Thread Holden Karau
I’d be in favor of a back porting with the idea its a bug fix for a language (admittedly not a version we’ve supported before) On Mon, Feb 13, 2023 at 9:19 AM L. C. Hsieh wrote: > If it is not supported in Spark 3.3.x, it looks like an improvement at > Spark 3.4. > For such cases we usually do n

  1   2   3   4   5   6   >