Thank you for initiating this. BTW, RC failures are irrelevant to the new feature backporting request.
So, in principle, I'm -1 for this late arrival because this could be a bad example which opens the door to all random backporting and delays. However, I'll follow a broader community consensus (like an official voting) for this specific feature. I guess this discussion thread was initiated as a preparation for that. :) Thanks, Dongjoon. On Tue, Mar 4, 2025 at 7:08 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> wrote: > Thank you for understanding. Actually I'm dealing with a blocker for Spark > 4.0.0 (so RC will always fail till I address this), you may want to join > the discussion to unblock me. > https://lists.apache.org/thread/xzk9729lsmo397crdtk14f74g8cyv4sr > > For sure, we will work with Wenchen to get the final sign off - we won't > push this more if he is not comfortable with it. Also for sure I'm open to > hearing more voices. > > Thanks again, > Jungtaek Lim (HeartSaVioR) > > On Wed, Mar 5, 2025 at 10:10 AM Mridul Muralidharan <mri...@gmail.com> > wrote: > >> >> Hi Jungtaek, >> >> It is fairly irregular to make feature updates this late, but given >> that RC2 appears to have failed - you should be getting a sign off from the >> release manager in particular; whose life will be made difficult with this >> :-) >> I dont have strong objections if RM is fine absorbing the load .... >> >> Will let others chime in. >> >> Regards, >> Mridul >> >> >> On Tue, Mar 4, 2025 at 2:32 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> >> wrote: >> >>> Hi Mridul, >>> >>> I'd like to persuade you if your concern is just that it's a bit late, >>> because of the following: >>> >>> 1. The change only introduces a parity with Spark Connect, hence low >>> risk and don't have a chance to break other stuff. If it breaks, it only >>> breaks TWS + Spark Connect combination. >>> >>> For reference, here are PRs for TWS + Spark Connect: >>> >>> PySpark: https://github.com/apache/spark/pull/49560 >>> Scala: https://github.com/apache/spark/pull/49488 >>> >>> 2. These PRs aren't something we brought up at the last minute. They >>> were already up in mid Jan hence they were technically not very late - it's >>> just that the review process took more time than we anticipated. >>> >>> 3. TWS is a new API in Structured Streaming which we have put yearly >>> effort into. The API has been targeted to 4.0 in very early stages of Spark >>> 4.0.0 release, we called out the TWS project every time there were threads >>> in dev@ to collect out projects for Spark 4.0. Not having parity on >>> Spark Connect sounds to me to be incomplete, and we know this will take at >>> least 6 months to address (too, too long) if we decide to postpone. >>> >>> I understand it's not a best practice to add features at RC phase, but >>> honestly this is just a timing issue. We aren't proposing features in the >>> RC phase. (If this change were later than the proposed RC date, I should >>> have posted to ask for postponing RC a bit.) It unfortunately took time to >>> review them. >>> >>> I hope this could influence your thoughts about this. >>> >>> Thanks, >>> Jungtaek Lim (HeartSaVioR) >>> >>> On Wed, Mar 5, 2025 at 2:28 AM Mridul Muralidharan <mri...@gmail.com> >>> wrote: >>> >>>> >>>> Hi Jungtaek, >>>> >>>> We are already in RC2 for 4.0, right ? >>>> A bit too late for this IMO - we can always introduce it in 4.1 >>>> >>>> >>>> Regards, >>>> Mridul >>>> >>>> >>>> On Tue, Mar 4, 2025 at 7:22 AM Herman van Hovell >>>> <her...@databricks.com.invalid> wrote: >>>> >>>>> +1 >>>>> >>>>> On Tue, Mar 4, 2025 at 2:07 AM Anish Shrigondekar >>>>> <anish.shrigonde...@databricks.com.invalid> wrote: >>>>> >>>>>> +1 - Would be great to get this into the Spark 4.0 release. >>>>>> >>>>>> Thanks, >>>>>> Anish >>>>>> >>>>>> On Mon, Mar 3, 2025 at 9:35 PM Jungtaek Lim < >>>>>> kabhwan.opensou...@gmail.com> wrote: >>>>>> >>>>>>> Hi dev, >>>>>>> >>>>>>> We are going to introduce a new API named `transformWithState` for >>>>>>> streaming query, which allows users to perform more complex stateful >>>>>>> operation in user function, with lot simpler code compared to >>>>>>> `flatMapGroupsWithState` (and `applyInPandasWithState`). >>>>>>> >>>>>>> The target version has been Spark 4.0.0 and we track this project as >>>>>>> a major one for Spark 4. We push most planned features into Spark 4.0.0, >>>>>>> except Spark Connect support. >>>>>>> >>>>>>> The PRs for Spark Connect support are merged into Spark 4.1 branch, >>>>>>> but I'm seeking the voice whether we can introduce Spark Connect >>>>>>> support to >>>>>>> Spark 4.0.0. >>>>>>> >>>>>>> I understand this arrives a bit late, but since the API is something >>>>>>> backed by a huge effort and I foresee this new API to replace the usage >>>>>>> of >>>>>>> flatMapGroupsWithState and applyInPandasWithState sooner, I'd like to >>>>>>> make >>>>>>> sure we don't push users back to wait for another 6+ months to use this >>>>>>> in >>>>>>> Spark Connect. >>>>>>> >>>>>>> Would love to hear your thoughts. >>>>>>> >>>>>>> Thanks, >>>>>>> Jungtaek Lim (HeartSaVioR) >>>>>>> >>>>>>