I think there's a separate email thread "Java Client for Spark Connect" by Martin
On Mon, 27 Jan 2025 at 15:17, Balaji Sudharsanam V <balaji.sudharsa...@ibm.com.invalid> wrote: > Hi Hyukjin Kwon, > > Sorry for bringing in off the topic discussion, > > Is there a Java Client that is similar to PySpark to work with Spark > connect? > > > > Thanks, > > Balaji > > > > *From:* Hyukjin Kwon <gurwls...@apache.org> > *Sent:* 25 January 2025 04:46 > *To:* Deependra Patel <pateldeependr...@gmail.com> > *Cc:* dev@spark.apache.org > *Subject:* [EXTERNAL] Re: [Connect] Install additional python packages > after session creation > > > > That's me. It's not anywhere yet and it's WIP as mentioned in the talk. > I'm still dealing with its design. On Sat, Jan 25, 2025 at 1: 00 AM > Deependra Patel <pateldeependra06@ gmail. com> wrote: Hi all, There are > ways through > > That's me. It's not anywhere yet and it's WIP as mentioned in the talk. > I'm still dealing with its design. > > > > On Sat, Jan 25, 2025 at 1:00 AM Deependra Patel < > pateldeependr...@gmail.com> wrote: > > Hi all, > > There are ways through the `addArtifacts` API in an existing session but > for that we need to have dependencies properly gzipped. In the case of > different kernel/OS between client and server, it won't work either I > believe. What I am interested in is doing some sort of `pip install > <package` on the cluster from my client. > > > > I came across this databricks video Dependency management in Spark connect > <https://youtu.be/PbvIak6Z8eI?feature=shared&t=679>, where there was > mention of following functionality but I don't see it in the master branch > <https://github.com/apache/spark/blob/master/python/pyspark/sql/connect/udf.py>. > Is it only supported in Databricks and no plans of open source in near > future? > > > > ``` > > @udf(packages=["pandas==1.5.3", "pyarrow"] > > def myudf(): > > import pandas > > ``` > > ----- > > I had another question about extending the Spark connect client (& server) > itself if I want to add a new Spark connect gRPC API. Is there a way to add > an additional proto to my package (that extends SparkSession from pyspark)? > I looked into Spark connect plugins and they are only to modify the plan > etc, not for adding a new API. > > > > Regards, > > Deependra > > > >