Re: [DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

Cheng Pan Tue, 02 Jul 2024 01:54:12 -0700

Thanks for raising this discussion, I think putting the connect folder on the 
top level is a good idea to promote Spark Connect, while leaving the connect 
jvm client in a separate folder looks weird. I suppose there is no contract to 
leave all optional modules under `connector`? e.g. 
`resource-managers/kubernetes/{docker,integration-tests}`, `hadoop-cloud`. What 
about moving the whole `connect` folder to the top level?


Thanks,
Cheng Pan


> On Jul 2, 2024, at 08:19, Hyukjin Kwon <gurwls...@apache.org> wrote:
> 
> Hi all,
> 
> I would like to discuss moving Spark Connect server to builtin package. Right 
> now, users have to specify —packages when they run Spark Connect server 
> script, for example:
> 
> ./sbin/start-connect-server.sh --jars `ls 
> connector/connect/server/target/**/spark-connect*SNAPSHOT.jar`
> or
> 
> ./sbin/start-connect-server.sh --packages 
> org.apache.spark:spark-connect_2.12:3.5.1
> which is a little bit odd that sbin scripts should provide jars to start.
> 
> Moving it to builtin package is pretty straightforward because most of jars 
> are shaded, and the impact would be minimal, I have a prototype here 
> apache/spark/#47157 <https://github.com/apache/spark/pull/47157>. This also 
> simplifies Python local running logic a lot.
> 
> User facing API layer, Spark Connect Client, stays external but I would like 
> the internal/admin server layer, Spark Connect Server, implementation to be 
> built in Spark.
> 
> Please let me know if you have thoughts on this!
> 
>

Re: [DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

Reply via email to