Hi Adam, Thanks for raising your concerns! This is also why we are not making Spark Connect the default but providing an additional Spark distribution so that users can opt in easily. There is a simple fix for this security issue as @Hyukjin Kwon <gurwls...@gmail.com> mentioned and we are working on it: https://github.com/apache/spark/pull/49107#issuecomment-2638356393
On Thu, Feb 6, 2025 at 9:45 AM Hyukjin Kwon <gurwls...@apache.org> wrote: > This is exactly the same case with the Py4J gateway server. We can easily > implement that - I am one of the maintainers of Py4J fwiw and running a > local Spark Connect server is already there apart from the PR > https://github.com/apache/spark/pull/49107. > > On Thu, 6 Feb 2025 at 10:40, Adam Binford <adam...@gmail.com> wrote: > >> -1 (non-binding) for me. I've commented on the PR for this ( >> https://github.com/apache/spark/pull/49107), but in its current state >> this seems like it would introduce a massive security vulnerability. If a >> user launches a "Spark Connect enabled" cluster deploy mode job in a >> multi-tenant YARN cluster, it will launch a wide open Spark Connect server >> alongside the driver on any given compute host. Any other users could then >> connect to this server and do whatever they wanted using the other users >> credentials. If this issue is addressed I would change to 0. >> >> Best case scenario this was a small oversight that would have introduced >> a major vulnerability, worst case scenario this was a coordinated effort to >> slip a backdoor into a widely used application. Either way, this does not >> lend itself to something that should be enabled by default without >> rigorous testing in real world scenarios. >> >> This is just my opinion, but I don't understand why these conversations >> have been happening for so long and this feature _still isn't even >> available yet_. Having the feature be complete and available for user >> testing seems like it should be a prerequisite to any discussion of making >> it the default behavior, otherwise nobody knows exactly what the behavior >> is you are trying to make the default. >> >> Adam >> >> On Wed, Feb 5, 2025 at 11:51 AM Chao Sun <sunc...@apache.org> wrote: >> >>> +1 >>> >>> On Wed, Feb 5, 2025 at 8:42 AM Martin Grund >>> <mar...@databricks.com.invalid> wrote: >>> >>>> +1 >>>> >>>> On Wed, Feb 5, 2025 at 17:15 bo yang <bobyan...@gmail.com> wrote: >>>> >>>>> +1 (non-binding) >>>>> >>>>> On Wed, Feb 5, 2025 at 7:51 AM Jules Damji <jules.da...@gmail.com> >>>>> wrote: >>>>> >>>>>> +1 (non-binding) >>>>>> >>>>>> Excuse the thumb typos >>>>>> >>>>>> >>>>>> On Tue, 04 Feb 2025 at 11:06 PM, Wenchen Fan <cloud0...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> Given the positive feedback in the previous DISCUSS email >>>>>>> <https://lists.apache.org/thread/loo1r84ovrzpskkn9cfmjfb0vwx4xnrq>, >>>>>>> I'd like to start the vote for the proposal "Publish additional Spark >>>>>>> distribution with Spark Connect enabled". >>>>>>> >>>>>>> Please vote for the next 72 hours: >>>>>>> >>>>>>> [ ] +1: Accept the proposal >>>>>>> [ ] +0 >>>>>>> [ ]- 1: I don’t think this is a good idea because … >>>>>>> >>>>>>> Best, >>>>>>> Wenchen Fan >>>>>>> >>>>>> >> >> -- >> Adam Binford >> >