Using Redirects for recently broken links

2024-09-20 Thread Matthew Powers
Hey devs :) When I do a "pyspark groupby" Google search, I get to the following link, which is broken: https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.sql.DataFrame.groupBy.html I guess this is the new URL? https://spark.apache.org/docs/3.5.2/api/python/reference/pyspark.sql/

Re: [VOTE] Using Github Issues for Spark-Connect-Go _only_ issues.

2024-08-12 Thread Matthew Powers
+1 (non-binding) On Mon, Aug 12, 2024 at 12:11 PM Denny Lee wrote: > +1 (non-binding) > > On Mon, Aug 12, 2024 at 16:43 Reynold Xin > wrote: > >> +1 >> >> On Mon, Aug 12, 2024 at 10:28 AM Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> +1 for me >>> >>> Mich Talebzadeh, >>> >>> Ar

Re: [DISCUSS] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-03 Thread Matthew Powers
Yea, this would be great. spark-connect-go is still experimental and anything we can do to get it production grade would be a great step IMO. The Go community is excited to write Spark... with Go! On Wed, Jul 3, 2024 at 8:49 PM Hyukjin Kwon wrote: > Hi all, > > The Spark Connect Go client repo

Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-03 Thread Matthew Powers
+1 (non-binding) Thanks! On Wed, Jul 3, 2024 at 1:58 PM Xinrong Meng wrote: > +1 > > Thank you @Hyukjin Kwon ! > > On Wed, Jul 3, 2024 at 8:55 AM bo yang wrote: > >> +1 (non-binding) >> >> On Tue, Jul 2, 2024 at 11:22 PM Cheng Pan wrote: >> >>> +1 (non-binding) >>> >>> Thanks, >>> Cheng Pan

Re: [DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-02 Thread Matthew Powers
This is a great idea and would be a great quality of life improvement. +1 (non-binding) On Tue, Jul 2, 2024 at 4:56 AM Hyukjin Kwon wrote: > > while leaving the connect jvm client in a separate folder looks weird > > I plan to actually put it at the top level together but I feel like this > has

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-05 Thread Matthew Powers
I am a huge fan of the Apache Spark docs and I regularly look at the analytics on this page to see how well they are doing.

Re: [DISCUSS] Multiple columns adding/replacing support in PySpark DataFrame API

2021-04-30 Thread Matthew Powers
Thanks for starting this good discussion. You can add multiple columns with select to avoid calling withColumn multiple times: val newCols = Seq(col("*"), lit("val1").as("key1"), lit("val2").as("key2")) df.select(newCols: _*).show() withColumns would be a nice interface for less technical Spark

Re: Bintray replacement for spark-packages.org

2021-04-26 Thread Matthew Powers
Great job fixing this!! I just checked and it's working on my end. Updated the resolver and sbt test still works just fine. On Mon, Apr 26, 2021 at 3:31 AM Bo Zhang wrote: > Hi Apache Spark devs, > > As y

Re: Auto-closing PRs or How to get reviewers' attention

2021-02-23 Thread Matthew Powers
Enrico - thanks for sharing your experience. I recently got a couple of PRs merged and my experience was different. I got lots of feedback from several maintainers (thank you very much!). Can't speak to your PRs specifically, but can give the general advice that pivoting code based on maintainer

Re: [Spark SQL]: SQL, Python, Scala and R API Consistency

2021-01-30 Thread Matthew Powers
siders language specific context. Many of expressions are > for SQL compliance. Many data silence python libraries don't support such > features as an example. > > > > On Fri, 29 Jan 2021, 12:04 Matthew Powers, > wrote: > >> Thanks for the thoughtful responses

Re: [Spark SQL]: SQL, Python, Scala and R API Consistency

2021-01-28 Thread Matthew Powers
Thanks for the thoughtful responses. I now understand why adding all the functions across all the APIs isn't the default. To Nick's point, relying on heuristics to gauge user interest, in addition to personal experience, is a good idea. The regexp_extract_all SO thread has 16,000 views