Thanks everyone for the feedback.

I've just updated the status of Flink 1.11.3 earlier, in its corresponding
discussion thread [1].

>From the looks of it, it seems like it makes sense to proceed with StateFun
2.2.1 without waiting for Flink 1.11.3.
Since this is also the consensus we've reached here, I have proceeded to
create RC1 for StateFun 2.2.1 [2].

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Apache-Flink-1-11-3-td45989.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-StateFun-hotfix-version-2-2-1-td46239.html

On Tue, Nov 3, 2020 at 10:42 PM Robert Metzger <rmetz...@apache.org> wrote:

> Hi Gordon,
> thanks a lot for this clarification.
>
> In this case I would vote for releasing StateFun 2.2.1 asap and not wait
> for 1.11.3.
>
> Thanks a lot for your efforts!
>
>
> On Tue, Nov 3, 2020 at 3:38 PM Tzu-Li (Gordon) Tai <tzuli...@apache.org>
> wrote:
>
>> Hi Robert,
>>
>> So far we've only seen a single user report the issue, but the severity
>> of FLINK-19692 is actually pretty huge.
>> TL;DR: If a checkpoint / savepoint that contains feedback events (which
>> is considered normal under typical StateFun operations) is attempted to be
>> restored from, the restore would always fail.
>>
>> That's why we came up with the discussion to potentially release a
>> "partial" solution with StateFun 2.2.1 already so that at least there is a
>> StateFun release available that works properly with failure recoveries,
>> and then after that release another follow-up StateFun hotfix release
>> 2.2.2, which would include Flink 1.11.3, to address the remaining part of
>> the problem.
>>
>> BR,
>> Gordon
>>
>> On Tue, Nov 3, 2020 at 9:33 PM Robert Metzger <rmetz...@apache.org>
>> wrote:
>>
>>> Thanks a lot for starting this thread.
>>> How many users are affected by the problem? Is it somebody else besides
>>> the initial issue reporter?
>>> If it is just one person, I would suggest to rather help pushing the
>>> 1.11.3 release over the line or work on more StateFun features ;)
>>>
>>> On Tue, Nov 3, 2020 at 11:58 AM Igal Shilman <i...@ververica.com> wrote:
>>>
>>>> Hi Gordon,
>>>> Thanks for driving this discussion!
>>>>
>>>> I would go with the second suggestion - having two consecutive StateFun
>>>> releases 2.2.1 and 2.2.2, since the Flink-1.11.3 release
>>>> might take a while, and this hot-fix release is important enough to get
>>>> out
>>>> as early as possible.
>>>>
>>>> Cheers,
>>>> Igal.
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Nov 2, 2020 at 11:43 AM Tzu-Li (Gordon) Tai <
>>>> tzuli...@apache.org>
>>>> wrote:
>>>>
>>>> > Hi,
>>>> >
>>>> > We’re currently thinking about releasing StateFun 2.2.1, to address a
>>>> > critical bug that causes restores from checkpoints / savepoints to
>>>> fail
>>>> > under certain circumstances [1].
>>>> >
>>>> > To provide a bit more context, the full fix for this issue is
>>>> two-fold:
>>>> >
>>>> >    1. *Fix restoring from checkpoints / savepoints taken with the same
>>>> >    StateFun version:* this has already been fixed in StateFun, with
>>>> >    changes backported to `flink-statefun/release-2.2`.
>>>> >    2. *Allow restoring from older savepoints taken with StateFun <=
>>>> >    2.2.0:* this requires a few fixes to Flink around restoring
>>>> heap-based
>>>> >    timers [2] and iterating through key groups in restored raw keyed
>>>> state
>>>> >    streams [3]. These fixes will be included in Flink 1.11.3 [4],
>>>> meaning that
>>>> >    to fix this, StateFun will need to wait until Flink 1.11.3 is out
>>>> and
>>>> >    upgrade its Flink dependency.
>>>> >
>>>> > The main discussion point here is whether or not it makes sense for
>>>> > StateFun 2.2.1 to wait for Flink 1.11.3, so that both parts of the
>>>> problems
>>>> > 1) and 2) can be solved together in a single hotfix release.
>>>> >
>>>> > The other option is to release StateFun 2.2.1 already with fixes for
>>>> > problem 1) only, and have another follow-up hotfix release 2.2.2 after
>>>> > Flink 1.11.3 is available.
>>>> >
>>>> > I propose to keep a close eye on the progress of Flink 1.11.3 (you can
>>>> > track progress on the 1.11.3 discussion thread [4]), and *make a
>>>> decision
>>>> > here mid-week on Wednesday, Nov. 4th*.
>>>> > If by then we decide to not let StateFun 2.2.1 wait for Flink 1.11.3
>>>> > because it could take a while, we can start with a StateFun 2.2.1 RC
>>>> right
>>>> > away; otherwise, if Flink 1.11.3 seems to be just around the corner,
>>>> we can
>>>> > wait for a few more days.
>>>> >
>>>> > What do you think?
>>>> >
>>>> > Cheers,
>>>> > Gordon
>>>> >
>>>> > [1] https://issues.apache.org/jira/browse/FLINK-19692
>>>> > [2] https://github.com/apache/flink/pull/13761
>>>> > [3] https://github.com/apache/flink/pull/13772
>>>> > [4]
>>>> >
>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Apache-Flink-1-11-3-td45989.html
>>>> >
>>>>
>>>

Reply via email to