This is a great idea and would be a great quality of life improvement. +1 (non-binding)
On Tue, Jul 2, 2024 at 4:56 AM Hyukjin Kwon <gurwls...@apache.org> wrote: > > while leaving the connect jvm client in a separate folder looks weird > > I plan to actually put it at the top level together but I feel like this > has to be done with SPIP so I am moving internal server side first > orthogonally > > On Tue, 2 Jul 2024 at 17:54, Cheng Pan <pan3...@gmail.com> wrote: > >> Thanks for raising this discussion, I think putting the connect folder on >> the top level is a good idea to promote Spark Connect, while leaving the >> connect jvm client in a separate folder looks weird. I suppose there is no >> contract to leave all optional modules under `connector`? e.g. >> `resource-managers/kubernetes/{docker,integration-tests}`, `hadoop-cloud`. >> What about moving the whole `connect` folder to the top level? >> >> Thanks, >> Cheng Pan >> >> >> On Jul 2, 2024, at 08:19, Hyukjin Kwon <gurwls...@apache.org> wrote: >> >> Hi all, >> >> I would like to discuss moving Spark Connect server to builtin package. >> Right now, users have to specify —packages when they run Spark Connect >> server script, for example: >> >> ./sbin/start-connect-server.sh --jars `ls >> connector/connect/server/target/**/spark-connect*SNAPSHOT.jar` >> >> or >> >> ./sbin/start-connect-server.sh --packages >> org.apache.spark:spark-connect_2.12:3.5.1 >> >> which is a little bit odd that sbin scripts should provide jars to start. >> >> Moving it to builtin package is pretty straightforward because most of >> jars are shaded, and the impact would be minimal, I have a prototype here >> apache/spark/#47157 <https://github.com/apache/spark/pull/47157>. This >> also simplifies Python local running logic a lot. >> >> User facing API layer, Spark Connect Client, stays external but I would >> like the internal/admin server layer, Spark Connect Server, implementation >> to be built in Spark. >> >> Please let me know if you have thoughts on this! >> >> >>