Thanks for raising this discussion, I think putting the connect folder on the top level is a good idea to promote Spark Connect, while leaving the connect jvm client in a separate folder looks weird. I suppose there is no contract to leave all optional modules under `connector`? e.g. `resource-managers/kubernetes/{docker,integration-tests}`, `hadoop-cloud`. What about moving the whole `connect` folder to the top level?
Thanks, Cheng Pan > On Jul 2, 2024, at 08:19, Hyukjin Kwon <gurwls...@apache.org> wrote: > > Hi all, > > I would like to discuss moving Spark Connect server to builtin package. Right > now, users have to specify —packages when they run Spark Connect server > script, for example: > > ./sbin/start-connect-server.sh --jars `ls > connector/connect/server/target/**/spark-connect*SNAPSHOT.jar` > or > > ./sbin/start-connect-server.sh --packages > org.apache.spark:spark-connect_2.12:3.5.1 > which is a little bit odd that sbin scripts should provide jars to start. > > Moving it to builtin package is pretty straightforward because most of jars > are shaded, and the impact would be minimal, I have a prototype here > apache/spark/#47157 <https://github.com/apache/spark/pull/47157>. This also > simplifies Python local running logic a lot. > > User facing API layer, Spark Connect Client, stays external but I would like > the internal/admin server layer, Spark Connect Server, implementation to be > built in Spark. > > Please let me know if you have thoughts on this! > >