Hi Jungtaek, It is fairly irregular to make feature updates this late, but given that RC2 appears to have failed - you should be getting a sign off from the release manager in particular; whose life will be made difficult with this :-) I dont have strong objections if RM is fine absorbing the load ....
Will let others chime in. Regards, Mridul On Tue, Mar 4, 2025 at 2:32 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> wrote: > Hi Mridul, > > I'd like to persuade you if your concern is just that it's a bit late, > because of the following: > > 1. The change only introduces a parity with Spark Connect, hence low risk > and don't have a chance to break other stuff. If it breaks, it only breaks > TWS + Spark Connect combination. > > For reference, here are PRs for TWS + Spark Connect: > > PySpark: https://github.com/apache/spark/pull/49560 > Scala: https://github.com/apache/spark/pull/49488 > > 2. These PRs aren't something we brought up at the last minute. They were > already up in mid Jan hence they were technically not very late - it's just > that the review process took more time than we anticipated. > > 3. TWS is a new API in Structured Streaming which we have put yearly > effort into. The API has been targeted to 4.0 in very early stages of Spark > 4.0.0 release, we called out the TWS project every time there were threads > in dev@ to collect out projects for Spark 4.0. Not having parity on Spark > Connect sounds to me to be incomplete, and we know this will take at least > 6 months to address (too, too long) if we decide to postpone. > > I understand it's not a best practice to add features at RC phase, but > honestly this is just a timing issue. We aren't proposing features in the > RC phase. (If this change were later than the proposed RC date, I should > have posted to ask for postponing RC a bit.) It unfortunately took time to > review them. > > I hope this could influence your thoughts about this. > > Thanks, > Jungtaek Lim (HeartSaVioR) > > On Wed, Mar 5, 2025 at 2:28 AM Mridul Muralidharan <mri...@gmail.com> > wrote: > >> >> Hi Jungtaek, >> >> We are already in RC2 for 4.0, right ? >> A bit too late for this IMO - we can always introduce it in 4.1 >> >> >> Regards, >> Mridul >> >> >> On Tue, Mar 4, 2025 at 7:22 AM Herman van Hovell >> <her...@databricks.com.invalid> wrote: >> >>> +1 >>> >>> On Tue, Mar 4, 2025 at 2:07 AM Anish Shrigondekar >>> <anish.shrigonde...@databricks.com.invalid> wrote: >>> >>>> +1 - Would be great to get this into the Spark 4.0 release. >>>> >>>> Thanks, >>>> Anish >>>> >>>> On Mon, Mar 3, 2025 at 9:35 PM Jungtaek Lim < >>>> kabhwan.opensou...@gmail.com> wrote: >>>> >>>>> Hi dev, >>>>> >>>>> We are going to introduce a new API named `transformWithState` for >>>>> streaming query, which allows users to perform more complex stateful >>>>> operation in user function, with lot simpler code compared to >>>>> `flatMapGroupsWithState` (and `applyInPandasWithState`). >>>>> >>>>> The target version has been Spark 4.0.0 and we track this project as a >>>>> major one for Spark 4. We push most planned features into Spark 4.0.0, >>>>> except Spark Connect support. >>>>> >>>>> The PRs for Spark Connect support are merged into Spark 4.1 branch, >>>>> but I'm seeking the voice whether we can introduce Spark Connect support >>>>> to >>>>> Spark 4.0.0. >>>>> >>>>> I understand this arrives a bit late, but since the API is something >>>>> backed by a huge effort and I foresee this new API to replace the usage of >>>>> flatMapGroupsWithState and applyInPandasWithState sooner, I'd like to make >>>>> sure we don't push users back to wait for another 6+ months to use this in >>>>> Spark Connect. >>>>> >>>>> Would love to hear your thoughts. >>>>> >>>>> Thanks, >>>>> Jungtaek Lim (HeartSaVioR) >>>>> >>>>