I guess my one concern here would be are we going to expand the dependencies that are visible on the class path for non-connect users?
One of the pain points that folks experienced with upgrading can be from those changing. Otherwise this seems pretty reasonable. Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Tue, Jul 2, 2024 at 5:36 AM Matthew Powers <matthewkevinpow...@gmail.com> wrote: > This is a great idea and would be a great quality of life improvement. > > +1 (non-binding) > > On Tue, Jul 2, 2024 at 4:56 AM Hyukjin Kwon <gurwls...@apache.org> wrote: > >> > while leaving the connect jvm client in a separate folder looks weird >> >> I plan to actually put it at the top level together but I feel like this >> has to be done with SPIP so I am moving internal server side first >> orthogonally >> >> On Tue, 2 Jul 2024 at 17:54, Cheng Pan <pan3...@gmail.com> wrote: >> >>> Thanks for raising this discussion, I think putting the connect folder >>> on the top level is a good idea to promote Spark Connect, while leaving the >>> connect jvm client in a separate folder looks weird. I suppose there is no >>> contract to leave all optional modules under `connector`? e.g. >>> `resource-managers/kubernetes/{docker,integration-tests}`, `hadoop-cloud`. >>> What about moving the whole `connect` folder to the top level? >>> >>> Thanks, >>> Cheng Pan >>> >>> >>> On Jul 2, 2024, at 08:19, Hyukjin Kwon <gurwls...@apache.org> wrote: >>> >>> Hi all, >>> >>> I would like to discuss moving Spark Connect server to builtin package. >>> Right now, users have to specify —packages when they run Spark Connect >>> server script, for example: >>> >>> ./sbin/start-connect-server.sh --jars `ls >>> connector/connect/server/target/**/spark-connect*SNAPSHOT.jar` >>> >>> or >>> >>> ./sbin/start-connect-server.sh --packages >>> org.apache.spark:spark-connect_2.12:3.5.1 >>> >>> which is a little bit odd that sbin scripts should provide jars to start. >>> >>> Moving it to builtin package is pretty straightforward because most of >>> jars are shaded, and the impact would be minimal, I have a prototype here >>> apache/spark/#47157 <https://github.com/apache/spark/pull/47157>. This >>> also simplifies Python local running logic a lot. >>> >>> User facing API layer, Spark Connect Client, stays external but I would >>> like the internal/admin server layer, Spark Connect Server, implementation >>> to be built in Spark. >>> >>> Please let me know if you have thoughts on this! >>> >>> >>>