I think there's a separate email thread "Java Client for Spark Connect" by
Martin

On Mon, 27 Jan 2025 at 15:17, Balaji Sudharsanam V
<balaji.sudharsa...@ibm.com.invalid> wrote:

> Hi Hyukjin Kwon,
>
> Sorry for bringing in off the topic discussion,
>
> Is there a Java Client that is similar to PySpark to work with Spark
> connect?
>
>
>
> Thanks,
>
> Balaji
>
>
>
> *From:* Hyukjin Kwon <gurwls...@apache.org>
> *Sent:* 25 January 2025 04:46
> *To:* Deependra Patel <pateldeependr...@gmail.com>
> *Cc:* dev@spark.apache.org
> *Subject:* [EXTERNAL] Re: [Connect] Install additional python packages
> after session creation
>
>
>
> That's me. It's not anywhere yet and it's WIP as mentioned in the talk.
> I'm still dealing with its design. On Sat, Jan 25, 2025 at 1: 00 AM
> Deependra Patel <pateldeependra06@ gmail. com> wrote: Hi all, There are
> ways through
>
> That's me. It's not anywhere yet and it's WIP as mentioned in the talk.
> I'm still dealing with its design.
>
>
>
> On Sat, Jan 25, 2025 at 1:00 AM Deependra Patel <
> pateldeependr...@gmail.com> wrote:
>
> Hi all,
>
> There are ways through the `addArtifacts` API in an existing session but
> for that we need to have dependencies properly gzipped. In the case of
> different kernel/OS between client and server, it won't work either I
> believe. What I am interested in is doing some sort of `pip install
> <package` on the cluster from my client.
>
>
>
> I came across this databricks video Dependency management in Spark connect
> <https://youtu.be/PbvIak6Z8eI?feature=shared&t=679>, where there was
> mention of following functionality but I don't see it in the master branch
> <https://github.com/apache/spark/blob/master/python/pyspark/sql/connect/udf.py>.
> Is it only supported in Databricks and no plans of open source in near
> future?
>
>
>
> ```
>
> @udf(packages=["pandas==1.5.3", "pyarrow"]
>
> def myudf():
>
>     import pandas
>
> ```
>
> -----
>
> I had another question about extending the Spark connect client (& server)
> itself if I want to add a new Spark connect gRPC API. Is there a way to add
> an additional proto to my package (that extends SparkSession from pyspark)?
> I looked into Spark connect plugins and they are only to modify the plan
> etc, not for adding a new API.
>
>
>
> Regards,
>
> Deependra
>
>
>
>

Reply via email to