Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-09 Thread Holden Karau
I think this proposal is a good set of trade-offs and has existed in the community for a long period of time. I especially appreciate how the design is focused on a minimal useful component, with future optimizations considered from a point of view of making sure it's flexible, but actual concrete

Re: Apache Spark 3.0.2 Release ?

2021-02-13 Thread Holden Karau
+1, great idea. On Fri, Feb 12, 2021 at 6:40 PM Yuming Wang wrote: > +1. > > On Sat, Feb 13, 2021 at 10:38 AM Takeshi Yamamuro > wrote: > >> +1, too. Thanks, Dongjoon! >> >> 2021/02/13 11:07、Xiao Li のメール: >> >>  >> +1 >> >> Happy Lunar New Year! >> >> Xiao >> >> On Fri, Feb 12, 2021 at 5:33 PM

Re: Auto-closing PRs or How to get reviewers' attention

2021-02-18 Thread Holden Karau
Git blame is a good way to figure out likely potential reviewers (eg who’s been working in the area). Another is who filed the JIRA if it’s not you. On Thu, Feb 18, 2021 at 6:58 AM Enrico Minack wrote: > Hi Spark Developers, > > I have a fundamental question on the process of contributing to Apa

Re: [VOTE] SPIP: Add FunctionCatalog

2021-03-08 Thread Holden Karau
+1 (binding) On Mon, Mar 8, 2021 at 3:56 PM Ryan Blue wrote: > Hi everyone, I’d like to start a vote for the FunctionCatalog design > proposal (SPIP). > > The proposal is to add a FunctionCatalog interface that can be used to > load and list functions for Spark to call. There are interfaces for

Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-13 Thread Holden Karau
I think having pandas support inside of Spark makes sense. One of my questions is who are the majour contributors to this effort, is the community developing the pandas API layer for Spark interested in being part of Spark or do they prefer having their own release cycle? On Sat, Mar 13, 2021 at 5

Re: [VOTE] SPIP: Support pandas API layer on PySpark

2021-03-28 Thread Holden Karau
+1 On Sun, Mar 28, 2021 at 10:25 PM sarutak wrote: > +1 (non-binding) > > - Kousuke > > > +1 (non-binding) > > > > On Sun, Mar 28, 2021 at 9:06 PM 郑瑞峰 > > wrote: > > > >> +1 (non-binding) > >> > >> -- 原始邮件 -- > >> > >> 发件人: "Maxim Gekk" ; > >> 发送时间: 2021年3月29日(星期

Re: please read: current state and the future of the apache spark build system

2021-04-14 Thread Holden Karau
Thanks Shane for keeping the build infra structure running for all of these years :) I've got some Kubernetes infra on AS399306 down in HE in Fremont but it's also perhaps not of the newest variety, but so far no disk failures or anything like that (knock on wood of course). The catch is it's on a

Re: [SPARK-34738] issues w/k8s+minikube and PV tests

2021-04-15 Thread Holden Karau
What about if we just turn off the PV tests for now? I'd be happy to help with the debugging/upgrading. On Thu, Apr 15, 2021 at 2:28 AM Rob Vesse wrote: > > There’s at least one test (the persistent volumes one) that relies on some > Minikube functionality because we run integration tests for ou

Re: [VOTE] Release Spark 2.4.8 (RC3)

2021-04-28 Thread Holden Karau
I verified the virtualenv & pyspark installation on OSX & linux works as expected with the minimum version of Python. I double checked the Python tagged versions (mostly checking Python 3.8 wasn't listed since Spark 2.x only does up to 3.7). It might be good to include that as a reminder in the re

Should we add built in support for bouncy castle EC w/Kube

2021-04-29 Thread Holden Karau
Hi Folks, I've deployed a new version of K3s locally and I ran into an issue with the key format not being supported out of the box. We delegate to fabric8 which has bouncy castle EC as an optional dependency. Adding it would add ~6mb to the Kube jars. What do folks think? Cheers, Holden P.S.

Re: [VOTE] Release Spark 2.4.8 (RC4)

2021-05-10 Thread Holden Karau
+1 - pip install with Py 2.7 works (with the understandable warnings regarding Python 2.7 no longer being maintained). On Mon, May 10, 2021 at 11:18 AM sarutak wrote: > > +1 (non-binding) > > - Kousuke > > > It looks like the repository is "open" - it doesn't publish until > > "closed" after all

Re: Apache Spark 3.1.2 Release?

2021-05-17 Thread Holden Karau
+1 and thanks for volunteering to be the RM :) On Mon, May 17, 2021 at 4:09 PM Takeshi Yamamuro wrote: > Thank you, Dongjoon~ sgtm, too. > > On Tue, May 18, 2021 at 7:34 AM Cheng Su wrote: > >> +1 for a new release, thanks Dongjoon! >> >> Cheng Su >> >> On 5/17/21, 2:44 PM, "Liang-Chi Hsieh"

Spark on Kubernetes scheduler variety

2021-06-17 Thread Holden Karau
Hi Folks, I'm continuing my adventures to make Spark on containers party and I was wondering if folks have experience with the different batch scheduler options that they prefer? I was thinking so that we can better support dynamic allocation it might make sense for us to support using different s

Re: Spark on Kubernetes scheduler variety

2021-06-18 Thread Holden Karau
> > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable

Re: Spark on Kubernetes scheduler variety

2021-06-24 Thread Holden Karau
sclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monet

Re: Spark on Kubernetes scheduler variety

2021-06-24 Thread Holden Karau
and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or

Re: [DISCUSS] SPIP: Row-level operations in Data Source V2

2021-06-24 Thread Holden Karau
I took an initial look at the PRs this morning and I’ll go through the design doc in more detail but I think these features look great. It’s especially important with the CA regulation changes to make this easier for folks to implement. On Thu, Jun 24, 2021 at 4:54 PM Anton Okolnychyi wrote: > H

Re: Spark on Kubernetes scheduler variety

2021-07-01 Thread Holden Karau
gt;>> >>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility >>>>>>> for any loss, damage or destruction of data or any other property which >>>>>>> may >>>>>>> arise from relying on this

Re: Spark on Kubernetes scheduler variety

2021-07-08 Thread Holden Karau
2021 at 8:56 AM Holden Karau wrote: > That's awesome, I'm just starting to get context around Volcano but maybe > we can schedule an initial meeting for all of us interested in pursuing > this to get on the same page. > > On Wed, Jun 23, 2021 at 6:54 PM Klaus Ma wrote: &g

Re: Flaky build in GitHub Actions

2021-07-21 Thread Holden Karau
I noticed that the worker decommissioning suite maybe seems to be running up against the memory limits so I'm going to try and see if I can get our memory usage down a bit as well while we wait for GH response. In the meantime, I'm assuming if things pass Jenkins we are OK with merging yes? On Wed

Time to start publishing Spark Docker Images?

2021-07-21 Thread Holden Karau
Hi Folks, Many other distributed computing (https://hub.docker.com/r/rayproject/ray https://hub.docker.com/u/daskdev) and ASF projects ( https://hub.docker.com/u/apache) now publish their images to dockerhub. We've already got the docker image tooling in place, I think we'd need to ask the ASF to

Re: Time to start publishing Spark Docker Images?

2021-08-12 Thread Holden Karau
ding data from and transferring data to Postgres / Greenplum with >>> Spark SQL and DataFrames, 10~100x faster.* >>> *itatchi <https://github.com/yaooqinn/spark-func-extras>A** library t**hat >>> brings useful functions from various modern database management syste

Re: Time to start publishing Spark Docker Images?

2021-08-13 Thread Holden Karau
&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790689711%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4YYZ61B6datdx2GsxqnEUOpYuJUn35egYRQSVnUxtF0%3D&reserved=0>* >>

Re: Access to Apache GitHub

2021-08-15 Thread Holden Karau
I don’t think we need a new repo for working on proposed Dockerfiles. You can take a look at the existing Dockerfiles, file a JIRA, and make a fork, then raise a PR (eg follow the usual development process). On Sun, Aug 15, 2021 at 9:51 AM Mich Talebzadeh wrote: > > Maybe this one? > > > https:/

Re: Access to Apache GitHub

2021-08-15 Thread Holden Karau
. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Sun, 15 Aug 2021 at 20:24, Holden Karau wrote: > >> I don’t think we need a new repo for working on proposed Dockerfiles. You >>

Re: Time to start publishing Spark Docker Images?

2021-08-16 Thread Holden Karau
tps://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicit

Re: Time to start publishing Spark Docker Images?

2021-08-16 Thread Holden Karau
useful stuff for most >> users/organisations. My suggestions is to create for a given type (spark, >> spark-py etc): >> >> >>1. One vanilla flavour for everyday use with few useful packages >>2. One for medium use with most common packages for ETL/ELT

Re: Time to start publishing Spark Docker Images?

2021-08-17 Thread Holden Karau
nds this seems like a lot (if I recall >>>>>>> correctly it was around 400MB for existing images). >>>>>>> >>>>>>> >>>>>>> On 8/17/21 2:24 PM, Mich Talebzadeh wrote: >>>>>>> >>>>>>

-1s on committed but not released code?

2021-08-19 Thread Holden Karau
Hi Y'all, This just recently came up but I'm not super sure on how we want to handle this in general. If code was committed under the lazy consensus model and then a committer or PMC -1s it post merge, what do we want to do? I know we had some previous discussion around -1s, but that was largely

Add option to Spark UI to proxy to the executors?

2021-08-20 Thread Holden Karau
Hi Folks, I'm wondering what people think about the idea of having the Spark UI (optionally) act as a proxy to the executors? This could help with exec UI access in some deployment environments. Cheers, Holden :) -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performan

Re: Add option to Spark UI to proxy to the executors?

2021-08-22 Thread Holden Karau
> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case

Re: Add option to Spark UI to proxy to the executors?

2021-08-25 Thread Holden Karau
So I tried turning on the Spark exec UI proxy but it broke the Spark UI (in 3.1.2) and regardless of what URL I requested everything came back as text/html of the jobs page. Is anyone actively using this feature in prod? On Sun, Aug 22, 2021 at 5:58 PM Holden Karau wrote: > Oh cool. I’ll h

Adding Spark 4 to JIRA for targetted versions

2021-09-13 Thread Holden Karau
Hi Folks, I'm going through the Spark 3.2 tickets just to make sure were not missing anything important and I was wondering what folks thoughts are on adding Spark 4 so we can target API breaking changes to the next major version and avoid loosing track of the issue. Cheers, Holden :) -- Twit

Re: [VOTE] Release Spark 3.2.0 (RC5)

2021-09-27 Thread Holden Karau
I think even if we do cancel this RC we should leave it open for a bit to see if we can catch any other errors. On Mon, Sep 27, 2021 at 12:29 PM Dongjoon Hyun wrote: > Unfortunately, it's the same for me recently. Not only that, but I also > hit MetaspaceSize OOM, too. > I ended up with MAVEN_OP

Re: [VOTE] Release Spark 3.2.0 (RC6)

2021-09-29 Thread Holden Karau
PySpark smoke tests pass, I'm going to do a last pass through the JIRAs before my vote though. On Wed, Sep 29, 2021 at 8:54 AM Sean Owen wrote: > +1 looks good to me as before, now that a few recent issues are resolved. > > > On Tue, Sep 28, 2021 at 10:45 AM Gengliang Wang wrote: > >> Please vo

Re: [VOTE] Release Spark 3.2.0 (RC7)

2021-10-10 Thread Holden Karau
+1 On Sun, Oct 10, 2021 at 10:46 PM Wenchen Fan wrote: > +1 > > On Sat, Oct 9, 2021 at 2:36 PM angers zhu wrote: > >> +1 (non-binding) >> >> Cheng Pan 于2021年10月9日周六 下午2:06写道: >> >>> +1 (non-binding) >>> >>> Integration test passed[1] with my project[2]. >>> >>> [1] >>> https://github.com/house

Re: [VOTE] SPIP: Storage Partitioned Join for Data Source V2

2021-10-29 Thread Holden Karau
+1 On Fri, Oct 29, 2021 at 3:07 PM DB Tsai wrote: > +1 > > DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > > > On Fri, Oct 29, 2021 at 11:42 AM Ryan Blue wrote: > >> +1 >> >> On Fri, Oct 29, 2021 at 11:06 AM huaxin gao >> wrote: >> >>> +1 >>> >>> On Fri, Oct 29, 2021 at 10:59 AM

Re: DataFrame.mapInArrow

2021-11-10 Thread Holden Karau
Sorry I've been busy, I'll try and take a look tomorrow, excited to see this progress though :) On Wed, Nov 10, 2021 at 9:01 PM Hyukjin Kwon wrote: > Last reminder: I plan to merge this in a few more days. Any feedback and > review would be very appreciated. > > On Tue, 9 Nov 2021 at 21:51, Hyuk

Re: [DISCUSSION] SPIP: Support Volcano/Alternative Schedulers Proposal

2021-11-30 Thread Holden Karau
Thanks for putting this together, I’m really excited for us to add better batch scheduling integrations. On Tue, Nov 30, 2021 at 12:46 AM Yikun Jiang wrote: > Hey everyone, > > I'd like to start a discussion on "Support Volcano/Alternative Schedulers > Proposal". > > This SPIP is proposed to mak

Re: [Apache Spark Jenkins] build system shutting down Dec 23th, 2021

2021-12-06 Thread Holden Karau
Shane you kick ass thank you for everything you’ve done for us :) Keep on rocking :) On Mon, Dec 6, 2021 at 4:24 PM Hyukjin Kwon wrote: > Thanks, Shane. > > On Tue, 7 Dec 2021 at 09:19, Dongjoon Hyun > wrote: > >> I really want to thank you for all your help. >> You've done so many things for t

Re: Log4j 1.2.17 spark CVE

2021-12-12 Thread Holden Karau
My understanding is it only applies to log4j 2+ so we don’t need to do anything. On Sun, Dec 12, 2021 at 8:46 PM Pralabh Kumar wrote: > Hi developers, users > > Spark is built using log4j 1.2.17 . Is there a plan to upgrade based on > recent CVE detected ? > > > Regards > Pralabh kumar > -- Tw

Re: [DISCUSSION] SPIP: Support Volcano/Alternative Schedulers Proposal

2022-01-05 Thread Holden Karau
blematic. >>>>>>>> >>>>>>>> Definitely yes, we are on the same page. >>>>>>>> >>>>>>>> I think we have the same goal: propose a general and reasonable >>>>>>>> mechanism to ma

Re: [VOTE][SPIP] Support Customized Kubernetes Schedulers Proposal

2022-01-05 Thread Holden Karau
+1 (binding) On Wed, Jan 5, 2022 at 5:31 PM William Wang wrote: > +1 (non-binding) > > Yikun Jiang 于2022年1月6日周四 09:07写道: > >> Hi all, >> >> I’d like to start a vote for SPIP: "Support Customized Kubernetes >> Schedulers Proposal" >> >> The SPIP is to support customized Kubernetes schedulers in

Re: Tries on migrating Spark Linux arm64 Job from Jenkins to GitHub Actions

2022-01-08 Thread Holden Karau
Personally I’d love to see us compiling and testing on Linux arm64 as well. On Sat, Jan 8, 2022 at 7:49 PM Yikun Jiang wrote: > BTW, this is not intended to be in potential opposition to Apache Spark > Infra 2022 which dongjoon mentioned in "Apache Spark Jenkins Infra 2022". > It is just to shar

Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-21 Thread Holden Karau
On Fri, Jan 21, 2022 at 6:48 PM Sean Owen wrote: > Continue on the ticket - I am not sure this is established. We would block > a release for critical problems that are not regressions. This is not a > data loss / 'deleting data' issue even if valid. > You're welcome to provide feedback but votes

[VOTE] Spark 3.1.3 RC3

2022-02-01 Thread Holden Karau
Please vote on releasing the following candidate as Apache Spark version 3.1.3. The vote is open until Feb. 4th at 5 PM PST (1 AM UTC + 1 day) and passes if a majority +1 PMC votes are cast, with a minimum of 3 + 1 votes. [ ] +1 Release this package as Apache Spark 3.1.3 [ ] -1 Do not release thi

Re: [VOTE] SPIP: Catalog API for view metadata

2022-02-03 Thread Holden Karau
+1 (binding) On Thu, Feb 3, 2022 at 2:26 PM Erik Krogen wrote: > +1 (non-binding) > > Really looking forward to having this natively supported by Spark, so that > we can get rid of our own hacks to tie in a custom view catalog > implementation. I appreciate the care John has put into various par

Re: [VOTE] Spark 3.1.3 RC3

2022-02-03 Thread Holden Karau
>> December (Dec 6) when we were talking about release 3.2.1. >>>> >>>> Tom >>>> >>>> On Wed, Feb 2, 2022 at 2:07 AM Mridul Muralidharan >>>> wrote: >>>> > >>>> > Hi Holden, >>>> > >>>> &

Re: [VOTE] Spark 3.1.3 RC3

2022-02-08 Thread Holden Karau
Yup, I’ve run into some weirdness with docs again I want to verify before I send the vote email though. On Mon, Feb 7, 2022 at 10:06 PM Wenchen Fan wrote: > Shall we use the release scripts of branch 3.1 to release 3.1? > > On Fri, Feb 4, 2022 at 4:57 AM Holden Karau wrote: > &

[VOTE] Spark 3.1.3 RC4

2022-02-14 Thread Holden Karau
Please vote on releasing the following candidate as Apache Spark version 3.1.3. The vote is open until Feb. 18th at 1 PM pacific (9 PM GMT) and passes if a majority +1 PMC votes are cast, with a minimum of 3 + 1 votes. [ ] +1 Release this package as Apache Spark 3.1.3 [ ] -1 Do not release this p

Re: [VOTE] Spark 3.1.3 RC4

2022-02-18 Thread Holden Karau
ith -Pyarn -Pmesos -Pkubernetes >> >> Regards, >> Mridul >> >> >> On Wed, Feb 16, 2022 at 8:32 AM Thomas graves wrote: >> >>> +1 >>> >>> Tom >>> >>> On Mon, Feb 14, 2022 at 2:55 PM Holden Karau >>> wrote: &g

Re: [VOTE] Spark 3.1.3 RC4

2022-02-18 Thread Holden Karau
The vote passes with no 0s or -1s and the following +1: Holden Karau John Zhuge Mridul Muralidharan Thomas graves Gengliang Wang Wenchen Fan Yuming Wang Ruifeng Zheng Sean Owen I will begin finalizing the release now. On Fri, Feb 18, 2022 at 2:49 PM Holden Karau wrote: > +1 my self :) >

[ANNOUNCE] Apache Spark 3.1.3 released + Docker images

2022-02-21 Thread Holden Karau
We are happy to announce the availability of Spark 3.1.3! Spark 3.1.3 is a maintenance release containing stability fixes. This release is based on the branch-3.1 maintenance branch of Spark. We strongly recommend all 3.1 users to upgrade to this stable release. To download Spark 3.1.3, head over

Re: [ANNOUNCE] Apache Spark 3.1.3 released + Docker images

2022-02-21 Thread Holden Karau
g on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Mon, 21 Feb 2022 at 21:09, Holden Karau wrote: > >> We are happy to

Re: [ANNOUNCE] Apache Spark 3.1.3 released + Docker images

2022-02-21 Thread Holden Karau
.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author

Re: [ANNOUNCE] Apache Spark 3.1.3 released + Docker images

2022-02-22 Thread Holden Karau
12-8-jre-slim-buster latest >>>>> 31ed15daa2bf 12 hours ago >>>>> 531MB >>>>> >>>>> Then push it with (example) >>>>> >>>>> docker push apache/spark/tags/spark-3.1

Re: [ANNOUNCE] Apache Spark 3.1.3 released + Docker images

2022-02-25 Thread Holden Karau
> > ps. Any plans to make this images official docker images at some point > (for the extra security/validation) [1] > [1] https://docs.docker.com/docker-hub/official_images/ > > On Mon, Feb 21, 2022 at 10:09 PM Holden Karau > wrote: > > > > We are happy to ann

Re: CVE-2021-38296: Apache Spark Key Negotiation Vulnerability

2022-03-09 Thread Holden Karau
CVEs are generally not mentioned in the release notes or JIRA instead we track them at https://spark.apache.org/security.html once they are resolved (prior to the resolution the reports goes to secur...@spark.apache.org) to allow the project time to fix the issue before public disclosure so there i

Re: Apache Spark 3.3 Release

2022-03-14 Thread Holden Karau
On Mon, Mar 14, 2022 at 11:53 PM Xiao Li wrote: > Could you please list which features we want to finish before the branch > cut? How long will they take? > > Xiao > > Chao Sun 于2022年3月14日周一 13:30写道: > >> Hi Max, >> >> As there are still some ongoing work for the above listed SPIPs, can we >> st

Re: Apache Spark 3.3 Release

2022-03-15 Thread Holden Karau
May I suggest we push out one week (22nd) just to give everyone a bit of breathing space? Rushed software development more often results in bugs. On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang wrote: > > To make our release time more predictable, let us collect the PRs and > wait three more days be

Re: Apache Spark 3.3 Release

2022-03-16 Thread Holden Karau
68][SQL] Row-level Runtime Filtering > > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet > vectorized reader > > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum > > >> > > >> Do you mean we should include them, or exclude them

Re: [VOTE] Release Spark 3.3.0 (RC1)

2022-05-10 Thread Holden Karau
Technically release don't follow vetos (see https://www.apache.org/foundation/voting.html ) it's up to the RM if they get the minimum number of binding +1s (although they are encouraged to cancel the release if any serious issues are raised). That being said I'll add my -1 based on the issues repo

Re: [VOTE] Release Spark 3.3.0 (RC1)

2022-05-11 Thread Holden Karau
> On Wed, May 11, 2022 at 4:23 AM Hyukjin Kwon wrote: > >> I expect to see RC2 too. I guess he just sticks to the standard, leaving >> the vote open till the end. >> It hasn't got enough +1s anyway :-). >> >> On Wed, 11 May 2022 at 10:17, Holden Karau wrote: >

Re: Reverse proxy for Spark UI on Kubernetes

2022-05-16 Thread Holden Karau
Oh that’s rad 😊 On Tue, May 17, 2022 at 7:47 AM bo yang wrote: > Hi Spark Folks, > > I built a web reverse proxy to access Spark UI on Kubernetes (working > together with https://github.com/GoogleCloudPlatform/spark-on-k8s-operator). > Want to share here in case other people have similar need. >

Re: Reverse proxy for Spark UI on Kubernetes

2022-05-17 Thread Holden Karau
Could we make it do the same sort of history server fallback approach? On Tue, May 17, 2022 at 10:41 PM bo yang wrote: > It is like Web Application Proxy in YARN ( > https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html), > to provide easy access for Spark U

Re: [VOTE] Release Spark 3.3.0 (RC6)

2022-06-13 Thread Holden Karau
+1 On Mon, Jun 13, 2022 at 4:51 PM Yuming Wang wrote: > +1 (non-binding) > > On Tue, Jun 14, 2022 at 7:41 AM Dongjoon Hyun > wrote: > >> +1 >> >> Thanks, >> Dongjoon. >> >> On Mon, Jun 13, 2022 at 3:54 PM Chris Nauroth >> wrote: >> >>> +1 (non-binding) >>> >>> I repeated all checks I described

Re: [VOTE][SPIP] Spark Connect

2022-06-16 Thread Holden Karau
+1 On Thu, Jun 16, 2022 at 7:17 AM Thomas Graves wrote: > +1 for the concept. > Correct me if I'm wrong, but at a high level this is proposing adding > a new user API (which is language agnostic) and the proposal is to > start with something like the Logical Plan, with the addition of being > ab

Re: [SPARK-39515] Improve scheduled jobs in GitHub Actions

2022-06-20 Thread Holden Karau
How about a hallway meet up at Data AI summit to talk about build CI if folks are Interested? On Sun, Jun 19, 2022 at 7:50 PM Hyukjin Kwon wrote: > Increased the priority to a blocker - I don't think we can release with > these build failures and poor CI > > On Mon, 20 Jun 2022 at 10:39, Hyukjin

Re: Jupyter notebook on Dataproc versus GKE

2022-09-05 Thread Holden Karau
I’ve run Jupyter w/Spark on K8s, haven’t tried it with Dataproc personally. The Spark K8s pod scheduler is now more pluggable for Yunikorn and Volcano can be used with less effort. On Mon, Sep 5, 2022 at 7:44 AM Mich Talebzadeh wrote: > > Hi, > > > Has anyone got experience of running Jupyter o

Re: Jupyter notebook on Dataproc versus GKE

2022-09-05 Thread Holden Karau
ion of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Mon, 5 Sept 2022 at 1

Re: Jupyter notebook on Dataproc versus GKE

2022-09-06 Thread Holden Karau
rise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> >>> On Mon, 5 Sept 2022 at 20

Syndicate Apache Spark Twitter to Mastodon?

2022-11-30 Thread Holden Karau
Do we want to start syndicating Apache Spark Twitter to a Mastodon instance. It seems like a lot of software dev folks are moving over there and it would be good to reach our users where they are. Any objections / concerns? Any thoughts on which server we should pick if we do this? -- Twitter: ht

Re: Syndicate Apache Spark Twitter to Mastodon?

2022-11-30 Thread Holden Karau
? > > I believe the most devs are still using Twitter. > > > чт, 1 дек. 2022 г., 01:35 Holden Karau : > >> Do we want to start syndicating Apache Spark Twitter to a Mastodon >> instance. It seems like a lot of software dev folks are moving over there >> and it would

Re: Syndicate Apache Spark Twitter to Mastodon?

2022-12-01 Thread Holden Karau
s are in Twitter) > For Federated features, I think Slack would be a better platform, a lot > of Apache Big data projects have slack for federated features > > чт, 1 дек. 2022 г., 02:33 Holden Karau : > >> I agree that there is probably a majority still on twitter, but it would &

Spark on Kube (virtua) coffee/tea/pop times

2023-02-07 Thread Holden Karau
Hi Folks, It seems like we could maybe use some additional shared context around Spark on Kube so I’d like to try and schedule a virtual coffee session. Who all would be interested in virtual adventures around Spark on Kube development? No pressure if the idea of hanging out in a virtual chat wi

Re: Spark on Kube (virtua) coffee/tea/pop times

2023-02-07 Thread Holden Karau
ys to use > spark. > > Thanks! > Andrew > > On Tue, Feb 7, 2023 at 5:24 PM Holden Karau wrote: > > > > Hi Folks, > > > > It seems like we could maybe use some additional shared context around > Spark on Kube so I’d like to try and schedule a virtual

Re: Spark on Kube (virtua) coffee/tea/pop times

2023-02-08 Thread Holden Karau
ll make another doodle for the following week with more european friendly times. Let me know what folks think :) On Tue, Feb 7, 2023 at 3:23 PM Holden Karau wrote: > Hi Folks, > > It seems like we could maybe use some additional shared context around > Spark on Kube so I’d like to try an

Re: Spark on Kube (virtua) coffee/tea/pop times

2023-02-08 Thread Holden Karau
ngh : >>>>>>>> >>>>>>>>> Greetings everyone! >>>>>>>>> I am super new to this group and currently leading some work to >>>>>>>>> deploy spark on k8 for my company o9 Solutions. >>>>>

Re: Spark on Kube (virtua) coffee/tea/pop times

2023-02-10 Thread Holden Karau
able for any monetary damages arising from > such loss, damage or destruction. > > > > > On Wed, 8 Feb 2023 at 20:12, Holden Karau wrote: > >> My thought here was that it's more focused on getting to understand each >> other's goals / priorities an

Re: Spark on Kube (virtua) coffee/tea/pop times

2023-02-13 Thread Holden Karau
ail's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Fri, 10 Feb 2023 at 18:58, Holden Karau wrote: > >> Ok so the first iteration of this

Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-13 Thread Holden Karau
I’d be in favor of a back porting with the idea its a bug fix for a language (admittedly not a version we’ve supported before) On Mon, Feb 13, 2023 at 9:19 AM L. C. Hsieh wrote: > If it is not supported in Spark 3.3.x, it looks like an improvement at > Spark 3.4. > For such cases we usually do n

Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-13 Thread Holden Karau
That’s legit, if the patch author isn’t comfortable with a backport then let’s leave it be 👍 On Mon, Feb 13, 2023 at 9:59 AM Dongjoon Hyun wrote: > Hi, All. > > As the author of that `Improvement` patch, I strongly disagree with giving > the wrong idea which Python 3.11 is officially supported i

Re: SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-18 Thread Holden Karau
Is there someone focused on streaming work these days who would want to shepherd this? On Sat, Feb 18, 2023 at 5:02 PM Dongjoon Hyun wrote: > Thank you for considering me, but may I ask what makes you think to put me > there, Mich? I'm curious about your reason. > > > I have put dongjoon.hyun as

Re: Ammonite as REPL for Spark Connect

2023-03-22 Thread Holden Karau
I am +1 to the general concept of including Ammonite magic 🪄. On Wed, Mar 22, 2023 at 4:58 PM Herman van Hovell wrote: > Ammonite is maintained externally by Li Haoyi et al. We are including it > as a 'provided' dependency. The integration bits and pieces (1 file) are > included in Apache Spark.

Re: Apache Spark 3.2.4 EOL Release?

2023-04-04 Thread Holden Karau
+1 On Tue, Apr 4, 2023 at 11:04 AM L. C. Hsieh wrote: > +1 > > Sounds good and thanks Dongjoon for driving this. > > On 2023/04/04 17:24:54 Dongjoon Hyun wrote: > > Hi, All. > > > > Since Apache Spark 3.2.0 passed RC7 vote on October 12, 2021, branch-3.2 > > has been maintained and served well u

Re: Slack for Spark Community: Merging various threads

2023-04-07 Thread Holden Karau
I think there was some concern around how to make any sync channel show up in logs / index / search results? On Fri, Apr 7, 2023 at 9:41 AM Dongjoon Hyun wrote: > Thank you, All. > > I'm very satisfied with the focused and right questions for the real > issues by removing irrelevant claims. :) >

Re: ASF policy violation and Scala version issues

2023-06-06 Thread Holden Karau
So I think if the Spark PMC wants to ask Databricks something that could be reasonable (although I'm a little fuzzy as to the ask), but that conversation might belong on private@ (I could be wrong of course). On Tue, Jun 6, 2023 at 3:29 AM Mich Talebzadeh wrote: > I concur with you Sean. > > If

Re: JDK version support policy?

2023-06-07 Thread Holden Karau
So JDK 11 is still supported in open JDK until 2026, I'm not sure if we're going to see enough folks moving to JRE17 by the Spark 4 release unless we have a strong benefit from dropping 11 support I'd be inclined to keep it. On Tue, Jun 6, 2023 at 9:08 PM Dongjoon Hyun wrote: > I'm also +1 on dr

Re: [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-12 Thread Holden Karau
-0 I'd like to see more of a doc around what we're planning on for a 4.0 before we pick a target release date etc. (feels like cart before the horse). But it's a weak preference. On Mon, Jun 12, 2023 at 11:24 AM Xiao Li wrote: > Thanks for starting the vote. > > I do have a concern about the t

Gauging interest in: ScalaFix + Scala Steward for Spark 4.0

2023-06-12 Thread Holden Karau
My self and a few folks have been working on a spark-upgrade project (focused on getting folks onto current versions of Spark). Since it looks like were starting the discussion around Spark 4 I was thinking now could be a good time for us to consider if we want to try and integrate auto-upgrade rul

Re: Gauging interest in: ScalaFix + Scala Steward for Spark 4.0

2023-06-12 Thread Holden Karau
Yup I think buidling consensus on what goes in 4.X is something we’ll need to do. On Mon, Jun 12, 2023 at 11:56 AM Dongjoon Hyun wrote: > Thank you for sharing those. I'm also interested in taking advantage of > it. Also, I hope `spark-upgrade` can help us in line with Spark 4.0. > > However, we

Re: [VOTE][RESULT] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-20 Thread Holden Karau
ut if it only entails changing to >>> Scala 2.13 and dropping support for JDK 8, then we could also just release >>> a month after 3.5. >>> >>> How about we do this? We get 3.5 released, and afterwards we do a couple >>> of meetings where we build this road

Re: [VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Holden Karau
I’d like to start with a +1, better Python testing tools integrated into the project make sense. On Wed, Jun 21, 2023 at 8:11 AM Amanda Liu wrote: > Hi all, > > I'd like to start the vote for SPIP: PySpark Test Framework. > > The high-level summary for the SPIP is that it proposes an official te

Re: [VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Holden Karau
Wed, Jun 21, 2023 at 8:30 AM Reynold Xin wrote: > +1 > > This is a great idea. > > > On Wed, Jun 21, 2023 at 8:29 AM, Holden Karau > wrote: > >> I’d like to start with a +1, better Python testing tools integrated into >> the project make sense. >> >

Re: [VOTE][SPIP] Python Data Source API

2023-07-07 Thread Holden Karau
+1 On Fri, Jul 7, 2023 at 9:55 AM huaxin gao wrote: > +1 > > On Fri, Jul 7, 2023 at 8:59 AM Mich Talebzadeh > wrote: > >> +1 for me >> >> Mich Talebzadeh, >> Solutions Architect/Engineering Lead >> Palantir Technologies Limited >> London >> United Kingdom >> >> >>view my Linkedin profile >>

Improving Dynamic Allocation Logic for Spark 4+

2023-08-07 Thread Holden Karau
So I wondering if there is interesting in revisiting some of how Spark is doing it's dynamica allocation for Spark 4+? Some things that I've been thinking about: - Advisory user input (e.g. a way to say after X is done I know I need Y where Y might be a bunch of GPU machines) - Configurable toler

Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-07 Thread Holden Karau
Oh great point On Mon, Aug 7, 2023 at 2:23 PM bo yang wrote: > Thanks Holden for bringing this up! > > Maybe another thing to think about is how to make dynamic allocation more > friendly with Kubernetes and disaggregated shuffle storage? > > > > On Mon, Aug 7, 2023

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-07 Thread Holden Karau
Oooh fascinating. I’m going on call this week so it will take me awhile but I do want to review this :) On Mon, Aug 7, 2023 at 5:30 PM Pavan Kotikalapudi wrote: > Hi Spark Dev, > > I have extended traditional DRA to work for structured streaming > use-case. > > Here is an initial Implementation

Re: ASF board report draft for August 2023

2023-08-08 Thread Holden Karau
Maybe add a link to the 4.0 JIRA where we are tracking the current plans for 4.0? On Tue, Aug 8, 2023 at 9:33 AM Dongjoon Hyun wrote: > Thank you, Matei. > > It looks good to me. > > Dongjoon > > On Mon, Aug 7, 2023 at 22:54 Matei Zaharia > wrote: > >> It’s time to send our quarterly report to

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-08 Thread Holden Karau
2023 at 23:42, Mich Talebzadeh >>>> wrote: >>>> >>>> >>>> >>>> Hi, >>>> >>>> >>>> >>>> From what I have seen spark on a serverless cluster has hard up getting >>>> the dr

  1   2   3   4   5   6   >