Hi Jungtaek,

  It is fairly irregular to make feature updates this late, but given that
RC2 appears to have failed - you should be getting a sign off from the
release manager in particular; whose life will be made difficult with this
:-)
I dont have strong objections if RM is fine absorbing the load ....

Will let others chime in.

Regards,
Mridul


On Tue, Mar 4, 2025 at 2:32 PM Jungtaek Lim <kabhwan.opensou...@gmail.com>
wrote:

> Hi Mridul,
>
> I'd like to persuade you if your concern is just that it's a bit late,
> because of the following:
>
> 1. The change only introduces a parity with Spark Connect, hence low risk
> and don't have a chance to break other stuff. If it breaks, it only breaks
> TWS + Spark Connect combination.
>
> For reference, here are PRs for TWS + Spark Connect:
>
> PySpark: https://github.com/apache/spark/pull/49560
> Scala: https://github.com/apache/spark/pull/49488
>
> 2. These PRs aren't something we brought up at the last minute. They were
> already up in mid Jan hence they were technically not very late - it's just
> that the review process took more time than we anticipated.
>
> 3. TWS is a new API in Structured Streaming which we have put yearly
> effort into. The API has been targeted to 4.0 in very early stages of Spark
> 4.0.0 release, we called out the TWS project every time there were threads
> in dev@ to collect out projects for Spark 4.0. Not having parity on Spark
> Connect sounds to me to be incomplete, and we know this will take at least
> 6 months to address (too, too long) if we decide to postpone.
>
> I understand it's not a best practice to add features at RC phase, but
> honestly this is just a timing issue. We aren't proposing features in the
> RC phase. (If this change were later than the proposed RC date, I should
> have posted to ask for postponing RC a bit.) It unfortunately took time to
> review them.
>
> I hope this could influence your thoughts about this.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> On Wed, Mar 5, 2025 at 2:28 AM Mridul Muralidharan <mri...@gmail.com>
> wrote:
>
>>
>> Hi Jungtaek,
>>
>>   We are already in RC2 for 4.0, right ?
>> A bit too late for this IMO - we can always introduce it in 4.1
>>
>>
>> Regards,
>> Mridul
>>
>>
>> On Tue, Mar 4, 2025 at 7:22 AM Herman van Hovell
>> <her...@databricks.com.invalid> wrote:
>>
>>> +1
>>>
>>> On Tue, Mar 4, 2025 at 2:07 AM Anish Shrigondekar
>>> <anish.shrigonde...@databricks.com.invalid> wrote:
>>>
>>>> +1 - Would be great to get this into the Spark 4.0 release.
>>>>
>>>> Thanks,
>>>> Anish
>>>>
>>>> On Mon, Mar 3, 2025 at 9:35 PM Jungtaek Lim <
>>>> kabhwan.opensou...@gmail.com> wrote:
>>>>
>>>>> Hi dev,
>>>>>
>>>>> We are going to introduce a new API named `transformWithState` for
>>>>> streaming query, which allows users to perform more complex stateful
>>>>> operation in user function, with lot simpler code compared to
>>>>> `flatMapGroupsWithState` (and `applyInPandasWithState`).
>>>>>
>>>>> The target version has been Spark 4.0.0 and we track this project as a
>>>>> major one for Spark 4. We push most planned features into Spark 4.0.0,
>>>>> except Spark Connect support.
>>>>>
>>>>> The PRs for Spark Connect support are merged into Spark 4.1 branch,
>>>>> but I'm seeking the voice whether we can introduce Spark Connect support 
>>>>> to
>>>>> Spark 4.0.0.
>>>>>
>>>>> I understand this arrives a bit late, but since the API is something
>>>>> backed by a huge effort and I foresee this new API to replace the usage of
>>>>> flatMapGroupsWithState and applyInPandasWithState sooner, I'd like to make
>>>>> sure we don't push users back to wait for another 6+ months to use this in
>>>>> Spark Connect.
>>>>>
>>>>> Would love to hear your thoughts.
>>>>>
>>>>> Thanks,
>>>>> Jungtaek Lim (HeartSaVioR)
>>>>>
>>>>

Reply via email to