Thank you for initiating this.

BTW, RC failures are irrelevant to the new feature backporting request.

So, in principle, I'm -1 for this late arrival because this could be a bad
example which opens the door to all random backporting and delays.

However, I'll follow a broader community consensus (like an official
voting) for this specific feature.

I guess this discussion thread was initiated as a preparation for that. :)

Thanks,
Dongjoon.

On Tue, Mar 4, 2025 at 7:08 PM Jungtaek Lim <kabhwan.opensou...@gmail.com>
wrote:

> Thank you for understanding. Actually I'm dealing with a blocker for Spark
> 4.0.0 (so RC will always fail till I address this), you may want to join
> the discussion to unblock me.
> https://lists.apache.org/thread/xzk9729lsmo397crdtk14f74g8cyv4sr
>
> For sure, we will work with Wenchen to get the final sign off - we won't
> push this more if he is not comfortable with it. Also for sure I'm open to
> hearing more voices.
>
> Thanks again,
> Jungtaek Lim (HeartSaVioR)
>
> On Wed, Mar 5, 2025 at 10:10 AM Mridul Muralidharan <mri...@gmail.com>
> wrote:
>
>>
>> Hi Jungtaek,
>>
>>   It is fairly irregular to make feature updates this late, but given
>> that RC2 appears to have failed - you should be getting a sign off from the
>> release manager in particular; whose life will be made difficult with this
>> :-)
>> I dont have strong objections if RM is fine absorbing the load ....
>>
>> Will let others chime in.
>>
>> Regards,
>> Mridul
>>
>>
>> On Tue, Mar 4, 2025 at 2:32 PM Jungtaek Lim <kabhwan.opensou...@gmail.com>
>> wrote:
>>
>>> Hi Mridul,
>>>
>>> I'd like to persuade you if your concern is just that it's a bit late,
>>> because of the following:
>>>
>>> 1. The change only introduces a parity with Spark Connect, hence low
>>> risk and don't have a chance to break other stuff. If it breaks, it only
>>> breaks TWS + Spark Connect combination.
>>>
>>> For reference, here are PRs for TWS + Spark Connect:
>>>
>>> PySpark: https://github.com/apache/spark/pull/49560
>>> Scala: https://github.com/apache/spark/pull/49488
>>>
>>> 2. These PRs aren't something we brought up at the last minute. They
>>> were already up in mid Jan hence they were technically not very late - it's
>>> just that the review process took more time than we anticipated.
>>>
>>> 3. TWS is a new API in Structured Streaming which we have put yearly
>>> effort into. The API has been targeted to 4.0 in very early stages of Spark
>>> 4.0.0 release, we called out the TWS project every time there were threads
>>> in dev@ to collect out projects for Spark 4.0. Not having parity on
>>> Spark Connect sounds to me to be incomplete, and we know this will take at
>>> least 6 months to address (too, too long) if we decide to postpone.
>>>
>>> I understand it's not a best practice to add features at RC phase, but
>>> honestly this is just a timing issue. We aren't proposing features in the
>>> RC phase. (If this change were later than the proposed RC date, I should
>>> have posted to ask for postponing RC a bit.) It unfortunately took time to
>>> review them.
>>>
>>> I hope this could influence your thoughts about this.
>>>
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>> On Wed, Mar 5, 2025 at 2:28 AM Mridul Muralidharan <mri...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> Hi Jungtaek,
>>>>
>>>>   We are already in RC2 for 4.0, right ?
>>>> A bit too late for this IMO - we can always introduce it in 4.1
>>>>
>>>>
>>>> Regards,
>>>> Mridul
>>>>
>>>>
>>>> On Tue, Mar 4, 2025 at 7:22 AM Herman van Hovell
>>>> <her...@databricks.com.invalid> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> On Tue, Mar 4, 2025 at 2:07 AM Anish Shrigondekar
>>>>> <anish.shrigonde...@databricks.com.invalid> wrote:
>>>>>
>>>>>> +1 - Would be great to get this into the Spark 4.0 release.
>>>>>>
>>>>>> Thanks,
>>>>>> Anish
>>>>>>
>>>>>> On Mon, Mar 3, 2025 at 9:35 PM Jungtaek Lim <
>>>>>> kabhwan.opensou...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi dev,
>>>>>>>
>>>>>>> We are going to introduce a new API named `transformWithState` for
>>>>>>> streaming query, which allows users to perform more complex stateful
>>>>>>> operation in user function, with lot simpler code compared to
>>>>>>> `flatMapGroupsWithState` (and `applyInPandasWithState`).
>>>>>>>
>>>>>>> The target version has been Spark 4.0.0 and we track this project as
>>>>>>> a major one for Spark 4. We push most planned features into Spark 4.0.0,
>>>>>>> except Spark Connect support.
>>>>>>>
>>>>>>> The PRs for Spark Connect support are merged into Spark 4.1 branch,
>>>>>>> but I'm seeking the voice whether we can introduce Spark Connect 
>>>>>>> support to
>>>>>>> Spark 4.0.0.
>>>>>>>
>>>>>>> I understand this arrives a bit late, but since the API is something
>>>>>>> backed by a huge effort and I foresee this new API to replace the usage 
>>>>>>> of
>>>>>>> flatMapGroupsWithState and applyInPandasWithState sooner, I'd like to 
>>>>>>> make
>>>>>>> sure we don't push users back to wait for another 6+ months to use this 
>>>>>>> in
>>>>>>> Spark Connect.
>>>>>>>
>>>>>>> Would love to hear your thoughts.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jungtaek Lim (HeartSaVioR)
>>>>>>>
>>>>>>

Reply via email to