That's me. It's not anywhere yet and it's WIP as mentioned in the talk. I'm still dealing with its design.
On Sat, Jan 25, 2025 at 1:00 AM Deependra Patel <pateldeependr...@gmail.com> wrote: > Hi all, > There are ways through the `addArtifacts` API in an existing session but > for that we need to have dependencies properly gzipped. In the case of > different kernel/OS between client and server, it won't work either I > believe. What I am interested in is doing some sort of `pip install > <package` on the cluster from my client. > > I came across this databricks video Dependency management in Spark connect > <https://youtu.be/PbvIak6Z8eI?feature=shared&t=679>, where there was > mention of following functionality but I don't see it in the master branch > <https://github.com/apache/spark/blob/master/python/pyspark/sql/connect/udf.py>. > Is it only supported in Databricks and no plans of open source in near > future? > > ``` > @udf(packages=["pandas==1.5.3", "pyarrow"] > def myudf(): > import pandas > ``` > ----- > I had another question about extending the Spark connect client (& server) > itself if I want to add a new Spark connect gRPC API. Is there a way to add > an additional proto to my package (that extends SparkSession from pyspark)? > I looked into Spark connect plugins and they are only to modify the plan > etc, not for adding a new API. > > Regards, > Deependra > >