+1 (non-binding) On Thu, Feb 6, 2025 at 7:56 AM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote:
> +1 for the additional package. > > Dongjoon. > > On Wed, Feb 5, 2025 at 6:30 PM Wenchen Fan <cloud0...@gmail.com> wrote: > >> Hi Adam, >> >> Thanks for raising your concerns! This is also why we are not making >> Spark Connect the default but providing an additional Spark distribution so >> that users can opt in easily. There is a simple fix for this security issue >> as @Hyukjin Kwon <gurwls...@gmail.com> mentioned and we are working on >> it: https://github.com/apache/spark/pull/49107#issuecomment-2638356393 >> >> On Thu, Feb 6, 2025 at 9:45 AM Hyukjin Kwon <gurwls...@apache.org> wrote: >> >>> This is exactly the same case with the Py4J gateway server. We can >>> easily implement that - I am one of the maintainers of Py4J fwiw and >>> running a local Spark Connect server is already there apart from the PR >>> https://github.com/apache/spark/pull/49107. >>> >>> On Thu, 6 Feb 2025 at 10:40, Adam Binford <adam...@gmail.com> wrote: >>> >>>> -1 (non-binding) for me. I've commented on the PR for this ( >>>> https://github.com/apache/spark/pull/49107), but in its current state >>>> this seems like it would introduce a massive security vulnerability. If a >>>> user launches a "Spark Connect enabled" cluster deploy mode job in a >>>> multi-tenant YARN cluster, it will launch a wide open Spark Connect server >>>> alongside the driver on any given compute host. Any other users could then >>>> connect to this server and do whatever they wanted using the other users >>>> credentials. If this issue is addressed I would change to 0. >>>> >>>> Best case scenario this was a small oversight that would have >>>> introduced a major vulnerability, worst case scenario this was a >>>> coordinated effort to slip a backdoor into a widely used application. >>>> Either way, this does not lend itself to something that should be enabled >>>> by default without rigorous testing in real world scenarios. >>>> >>>> This is just my opinion, but I don't understand why these conversations >>>> have been happening for so long and this feature _still isn't even >>>> available yet_. Having the feature be complete and available for user >>>> testing seems like it should be a prerequisite to any discussion of making >>>> it the default behavior, otherwise nobody knows exactly what the behavior >>>> is you are trying to make the default. >>>> >>>> Adam >>>> >>>> On Wed, Feb 5, 2025 at 11:51 AM Chao Sun <sunc...@apache.org> wrote: >>>> >>>>> +1 >>>>> >>>>> On Wed, Feb 5, 2025 at 8:42 AM Martin Grund >>>>> <mar...@databricks.com.invalid> wrote: >>>>> >>>>>> +1 >>>>>> >>>>>> On Wed, Feb 5, 2025 at 17:15 bo yang <bobyan...@gmail.com> wrote: >>>>>> >>>>>>> +1 (non-binding) >>>>>>> >>>>>>> On Wed, Feb 5, 2025 at 7:51 AM Jules Damji <jules.da...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> +1 (non-binding) >>>>>>>> >>>>>>>> Excuse the thumb typos >>>>>>>> >>>>>>>> >>>>>>>> On Tue, 04 Feb 2025 at 11:06 PM, Wenchen Fan <cloud0...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> Given the positive feedback in the previous DISCUSS email >>>>>>>>> <https://lists.apache.org/thread/loo1r84ovrzpskkn9cfmjfb0vwx4xnrq>, >>>>>>>>> I'd like to start the vote for the proposal "Publish additional Spark >>>>>>>>> distribution with Spark Connect enabled". >>>>>>>>> >>>>>>>>> Please vote for the next 72 hours: >>>>>>>>> >>>>>>>>> [ ] +1: Accept the proposal >>>>>>>>> [ ] +0 >>>>>>>>> [ ]- 1: I don’t think this is a good idea because … >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Wenchen Fan >>>>>>>>> >>>>>>>> >>>> >>>> -- >>>> Adam Binford >>>> >>> -- John Zhuge