Re: [DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

Holden Karau Tue, 02 Jul 2024 07:05:23 -0700

I guess my one concern here would be are we going to expand the
dependencies that are visible on the class path for non-connect users?


One of the pain points that folks experienced with upgrading can be from
those changing.

Otherwise this seems pretty reasonable.

Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


On Tue, Jul 2, 2024 at 5:36 AM Matthew Powers <matthewkevinpow...@gmail.com>
wrote:

> This is a great idea and would be a great quality of life improvement.
>
> +1 (non-binding)
>
> On Tue, Jul 2, 2024 at 4:56 AM Hyukjin Kwon <gurwls...@apache.org> wrote:
>
>> > while leaving the connect jvm client in a separate folder looks weird
>>
>> I plan to actually put it at the top level together but I feel like this
>> has to be done with SPIP so I am moving internal server side first
>> orthogonally
>>
>> On Tue, 2 Jul 2024 at 17:54, Cheng Pan <pan3...@gmail.com> wrote:
>>
>>> Thanks for raising this discussion, I think putting the connect folder
>>> on the top level is a good idea to promote Spark Connect, while leaving the
>>> connect jvm client in a separate folder looks weird. I suppose there is no
>>> contract to leave all optional modules under `connector`? e.g.
>>> `resource-managers/kubernetes/{docker,integration-tests}`, `hadoop-cloud`.
>>> What about moving the whole `connect` folder to the top level?
>>>
>>> Thanks,
>>> Cheng Pan
>>>
>>>
>>> On Jul 2, 2024, at 08:19, Hyukjin Kwon <gurwls...@apache.org> wrote:
>>>
>>> Hi all,
>>>
>>> I would like to discuss moving Spark Connect server to builtin package.
>>> Right now, users have to specify —packages when they run Spark Connect
>>> server script, for example:
>>>
>>> ./sbin/start-connect-server.sh --jars `ls 
>>> connector/connect/server/target/**/spark-connect*SNAPSHOT.jar`
>>>
>>> or
>>>
>>> ./sbin/start-connect-server.sh --packages 
>>> org.apache.spark:spark-connect_2.12:3.5.1
>>>
>>> which is a little bit odd that sbin scripts should provide jars to start.
>>>
>>> Moving it to builtin package is pretty straightforward because most of
>>> jars are shaded, and the impact would be minimal, I have a prototype here
>>> apache/spark/#47157 <https://github.com/apache/spark/pull/47157>. This
>>> also simplifies Python local running logic a lot.
>>>
>>> User facing API layer, Spark Connect Client, stays external but I would
>>> like the internal/admin server layer, Spark Connect Server, implementation
>>> to be built in Spark.
>>>
>>> Please let me know if you have thoughts on this!
>>>
>>>
>>>

Re: [DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

Reply via email to