Re: [Connect] Install additional python packages after session creation

Hyukjin Kwon Fri, 24 Jan 2025 15:16:38 -0800

That's me. It's not anywhere yet and it's WIP as mentioned in the talk. I'm
still dealing with its design.


On Sat, Jan 25, 2025 at 1:00 AM Deependra Patel <pateldeependr...@gmail.com>
wrote:

> Hi all,
> There are ways through the `addArtifacts` API in an existing session but
> for that we need to have dependencies properly gzipped. In the case of
> different kernel/OS between client and server, it won't work either I
> believe. What I am interested in is doing some sort of `pip install
> <package` on the cluster from my client.
>
> I came across this databricks video Dependency management in Spark connect
> <https://youtu.be/PbvIak6Z8eI?feature=shared&t=679>, where there was
> mention of following functionality but I don't see it in the master branch
> <https://github.com/apache/spark/blob/master/python/pyspark/sql/connect/udf.py>.
> Is it only supported in Databricks and no plans of open source in near
> future?
>
> ```
> @udf(packages=["pandas==1.5.3", "pyarrow"]
> def myudf():
>     import pandas
> ```
> -----
> I had another question about extending the Spark connect client (& server)
> itself if I want to add a new Spark connect gRPC API. Is there a way to add
> an additional proto to my package (that extends SparkSession from pyspark)?
> I looked into Spark connect plugins and they are only to modify the plan
> etc, not for adding a new API.
>
> Regards,
> Deependra
>
>

Re: [Connect] Install additional python packages after session creation

Reply via email to