Hi Hyukjin Kwon, Sorry for bringing in off the topic discussion, Is there a Java Client that is similar to PySpark to work with Spark connect?
Thanks, Balaji From: Hyukjin Kwon <gurwls...@apache.org> Sent: 25 January 2025 04:46 To: Deependra Patel <pateldeependr...@gmail.com> Cc: dev@spark.apache.org Subject: [EXTERNAL] Re: [Connect] Install additional python packages after session creation That's me. It's not anywhere yet and it's WIP as mentioned in the talk. I'm still dealing with its design. On Sat, Jan 25, 2025 at 1: 00 AM Deependra Patel <pateldeependra06@ gmail. com> wrote: Hi all, There are ways through That's me. It's not anywhere yet and it's WIP as mentioned in the talk. I'm still dealing with its design. On Sat, Jan 25, 2025 at 1:00 AM Deependra Patel <pateldeependr...@gmail.com<mailto:pateldeependr...@gmail.com>> wrote: Hi all, There are ways through the `addArtifacts` API in an existing session but for that we need to have dependencies properly gzipped. In the case of different kernel/OS between client and server, it won't work either I believe. What I am interested in is doing some sort of `pip install <package` on the cluster from my client. I came across this databricks video Dependency management in Spark connect<https://youtu.be/PbvIak6Z8eI?feature=shared&t=679 >, where there was mention of following functionality but I don't see it in the master branch<https://github.com/apache/spark/blob/master/python/pyspark/sql/connect/udf.py >. Is it only supported in Databricks and no plans of open source in near future? ``` @udf(packages=["pandas==1.5.3", "pyarrow"] def myudf(): import pandas ``` ----- I had another question about extending the Spark connect client (& server) itself if I want to add a new Spark connect gRPC API. Is there a way to add an additional proto to my package (that extends SparkSession from pyspark)? I looked into Spark connect plugins and they are only to modify the plan etc, not for adding a new API. Regards, Deependra