Hi Hyukjin Kwon,

Sorry for bringing in off the topic discussion,
Is there a Java Client that is similar to PySpark to work with Spark connect?

Thanks,
Balaji


From: Hyukjin Kwon <gurwls...@apache.org>
Sent: 25 January 2025 04:46
To: Deependra Patel <pateldeependr...@gmail.com>
Cc: dev@spark.apache.org
Subject: [EXTERNAL] Re: [Connect] Install additional python packages after 
session creation

That's me. It's not anywhere yet and it's WIP as mentioned in the talk. I'm 
still dealing with its design. On Sat, Jan 25, 2025 at 1: 00 AM Deependra Patel 
<pateldeependra06@ gmail. com> wrote: Hi all, There are ways through

That's me. It's not anywhere yet and it's WIP as mentioned in the talk. I'm 
still dealing with its design.

On Sat, Jan 25, 2025 at 1:00 AM Deependra Patel 
<pateldeependr...@gmail.com<mailto:pateldeependr...@gmail.com>> wrote:
Hi all,
There are ways through the `addArtifacts` API in an existing session but for 
that we need to have dependencies properly gzipped. In the case of different 
kernel/OS between client and server, it won't work either I believe. What I am 
interested in is doing some sort of `pip install <package` on the cluster from 
my client.

I came across this databricks video Dependency management in Spark 
connect<https://youtu.be/PbvIak6Z8eI?feature=shared&t=679 >, where there was 
mention of following functionality but I don't see it in the master 
branch<https://github.com/apache/spark/blob/master/python/pyspark/sql/connect/udf.py
 >. Is it only supported in Databricks and no plans of open source in near 
future?

```
@udf(packages=["pandas==1.5.3", "pyarrow"]
def myudf():
    import pandas
```
-----
I had another question about extending the Spark connect client (& server) 
itself if I want to add a new Spark connect gRPC API. Is there a way to add an 
additional proto to my package (that extends SparkSession from pyspark)? I 
looked into Spark connect plugins and they are only to modify the plan etc, not 
for adding a new API.

Regards,
Deependra

Reply via email to