Let me clarify them. This is TL;DR: - The PR (https://github.com/apache/spark/pull/47328) mentioned Databricks in the PR description (it is edited now), which had to be avoided. - I also updated committer guidelines in spark-website to prevent such cases in the future. - Otherwise, the change was reverted, and merged back after having proper reviews. - The changes are legitimate, and benefit all users, not specific to a single vendor.
On Mon, 29 Jul 2024 at 15:13, Sean Owen <sro...@gmail.com> wrote: > Also from ASF community perspective - > > I think all are agreed this was merged too fast. But, I'm missing where > this is somehow due to the needs of a single vendor. Where is this related > to file systems or keys? > did I miss it from another discussion or PR, or is this actually about a > different issue? > > Otherwise I don't see what this lecture is about. The issue that was > raised (existing Spark Session) is, I agree, not an issue. > > > On Mon, Jul 29, 2024 at 12:43 PM Steve Loughran > <ste...@cloudera.com.invalid> wrote: > >> >> I'm going to join in from an ASF community perspective. >> >> Nobody should be making fundamental changes to an ASF code base with a PR >> up and then merged two hours later because of the needs of a single vendor >> of a downstream product. This doesn't even give people in different time >> zones the chance to review it. It goes completely against the concept of >> "community" and replaces it with private problems, not shared with anyone, >> and large pieces of development work to address them without any >> opportunity for others to improve. Pieces of work which presumably must >> have been ongoing for some days. >> >> I know doing stuff in public is time-consuming as you have to spend a lot >> of time chasing reviews, but collaboration is essential as it ensures that >> changes meet the needs of a broader community than one single vendor. >> Avoiding that is exclusively and unhealthy for a project. >> >> If the databricks products have some problem resolving user:key secrets >> in paths in the virtual file system, that will be good to know, especially >> the what and the why -as others may encounter it too. At the very least: >> others should know what to do so as to avoid getting into the same >> situation. >> >> If you want more nimble development, well, closed source gives you that. >> Switching to commit-then-review on specific ASF repos is also allowed, >> despite the inherent risks. We use it for some of her hadoop release >> packaging/testing for a rapid iteration of release process automation and >> validation code. >> >> Anyway, the patch has been reverted and discussions are now ongoing, as >> they should have been from the outset. >> >> Steve >> >> >> On Wed, 24 Jul 2024 at 01:29, Hyukjin Kwon <gurwls...@apache.org> wrote: >> >>> There is always a running session. I replied in the PR. >>> >>> On Tue, 23 Jul 2024 at 23:32, Dongjoon Hyun <dongj...@apache.org> wrote: >>> >>>> I'm bumping up this thread because the overhead bites us back already. >>>> Here is a commit merged 3 hours ago. >>>> >>>> https://github.com/apache/spark/pull/47453 >>>> [SPARK-48970][PYTHON][ML] Avoid using SparkSession.getActiveSession in >>>> spark ML reader/writer >>>> >>>> In short, unlike the original PRs' claims, this commit starts to create >>>> `SparkSession` in this layer. Although I understand the reason why Hyukjin >>>> and Martin claims that `SparkSession` will be there in any way, this is an >>>> architectural change which we need to decide explicitly, not implicitly. >>>> >>>> > On 2024/07/13 05:33:32 Hyukjin Kwon wrote: >>>> > We actually get the active Spark session so it doesn't cause >>>> overhead. Also >>>> > even we create, it will create once which should be pretty trivial >>>> overhead. >>>> >>>> If this architectural change is required inevitably and needs to happen >>>> in Apache Spark 4.0.0. Can we have a dev-document about this? If there is >>>> no proper place, we can add it to the ML migration guide simply. >>>> >>>> Dongjoon. >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>> >>>>