Thank you for understanding. Actually I'm dealing with a blocker for Spark 4.0.0 (so RC will always fail till I address this), you may want to join the discussion to unblock me. https://lists.apache.org/thread/xzk9729lsmo397crdtk14f74g8cyv4sr
For sure, we will work with Wenchen to get the final sign off - we won't push this more if he is not comfortable with it. Also for sure I'm open to hearing more voices. Thanks again, Jungtaek Lim (HeartSaVioR) On Wed, Mar 5, 2025 at 10:10 AM Mridul Muralidharan <mri...@gmail.com> wrote: > > Hi Jungtaek, > > It is fairly irregular to make feature updates this late, but given that > RC2 appears to have failed - you should be getting a sign off from the > release manager in particular; whose life will be made difficult with this > :-) > I dont have strong objections if RM is fine absorbing the load .... > > Will let others chime in. > > Regards, > Mridul > > > On Tue, Mar 4, 2025 at 2:32 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> > wrote: > >> Hi Mridul, >> >> I'd like to persuade you if your concern is just that it's a bit late, >> because of the following: >> >> 1. The change only introduces a parity with Spark Connect, hence low risk >> and don't have a chance to break other stuff. If it breaks, it only breaks >> TWS + Spark Connect combination. >> >> For reference, here are PRs for TWS + Spark Connect: >> >> PySpark: https://github.com/apache/spark/pull/49560 >> Scala: https://github.com/apache/spark/pull/49488 >> >> 2. These PRs aren't something we brought up at the last minute. They were >> already up in mid Jan hence they were technically not very late - it's just >> that the review process took more time than we anticipated. >> >> 3. TWS is a new API in Structured Streaming which we have put yearly >> effort into. The API has been targeted to 4.0 in very early stages of Spark >> 4.0.0 release, we called out the TWS project every time there were threads >> in dev@ to collect out projects for Spark 4.0. Not having parity on >> Spark Connect sounds to me to be incomplete, and we know this will take at >> least 6 months to address (too, too long) if we decide to postpone. >> >> I understand it's not a best practice to add features at RC phase, but >> honestly this is just a timing issue. We aren't proposing features in the >> RC phase. (If this change were later than the proposed RC date, I should >> have posted to ask for postponing RC a bit.) It unfortunately took time to >> review them. >> >> I hope this could influence your thoughts about this. >> >> Thanks, >> Jungtaek Lim (HeartSaVioR) >> >> On Wed, Mar 5, 2025 at 2:28 AM Mridul Muralidharan <mri...@gmail.com> >> wrote: >> >>> >>> Hi Jungtaek, >>> >>> We are already in RC2 for 4.0, right ? >>> A bit too late for this IMO - we can always introduce it in 4.1 >>> >>> >>> Regards, >>> Mridul >>> >>> >>> On Tue, Mar 4, 2025 at 7:22 AM Herman van Hovell >>> <her...@databricks.com.invalid> wrote: >>> >>>> +1 >>>> >>>> On Tue, Mar 4, 2025 at 2:07 AM Anish Shrigondekar >>>> <anish.shrigonde...@databricks.com.invalid> wrote: >>>> >>>>> +1 - Would be great to get this into the Spark 4.0 release. >>>>> >>>>> Thanks, >>>>> Anish >>>>> >>>>> On Mon, Mar 3, 2025 at 9:35 PM Jungtaek Lim < >>>>> kabhwan.opensou...@gmail.com> wrote: >>>>> >>>>>> Hi dev, >>>>>> >>>>>> We are going to introduce a new API named `transformWithState` for >>>>>> streaming query, which allows users to perform more complex stateful >>>>>> operation in user function, with lot simpler code compared to >>>>>> `flatMapGroupsWithState` (and `applyInPandasWithState`). >>>>>> >>>>>> The target version has been Spark 4.0.0 and we track this project as >>>>>> a major one for Spark 4. We push most planned features into Spark 4.0.0, >>>>>> except Spark Connect support. >>>>>> >>>>>> The PRs for Spark Connect support are merged into Spark 4.1 branch, >>>>>> but I'm seeking the voice whether we can introduce Spark Connect support >>>>>> to >>>>>> Spark 4.0.0. >>>>>> >>>>>> I understand this arrives a bit late, but since the API is something >>>>>> backed by a huge effort and I foresee this new API to replace the usage >>>>>> of >>>>>> flatMapGroupsWithState and applyInPandasWithState sooner, I'd like to >>>>>> make >>>>>> sure we don't push users back to wait for another 6+ months to use this >>>>>> in >>>>>> Spark Connect. >>>>>> >>>>>> Would love to hear your thoughts. >>>>>> >>>>>> Thanks, >>>>>> Jungtaek Lim (HeartSaVioR) >>>>>> >>>>>