Re: [DISCUSS] Releasing Flink 1.1.4

Stefan Richter Fri, 28 Oct 2016 08:46:26 -0700

Benefit of a backport, as I see it, is increased stability. The danger is 
potentially breaking some code that was casting FileSystems to subtypes like 
LocalFileSytem. I don’t know how common that would be in user code.


> Am 28.10.2016 um 14:27 schrieb Ufuk Celebi <u...@apache.org>:
> 
> Thanks for all your feedback.
> 
> If there are no objections, I would like to stick to the mentioned
> issues in this thread and create RC1 as soon as they are all
> addressed. This will probably not be this week though, but it looks
> good for next week.
> 
> DONE
> =====
> - FLINK-4619: Answer client if savepoint restore fails
> - FLINK-4715: Safety net for stuck task cancellation
> - FLINK-4510: Always create CheckpointCoordinator
> - FLINK-4894: Don't block on buffer request after broadcast event
> - FLINK-4298: Add proper repository for Closure dependencies
> - FLINK-4218: Do not fail checkpoints when state size cannot be determined
> - FLINK-3347: TaskManager (or its ActorSystem) need to restart in case
> they notice quarantine
> - FLINK-4875: Use correct operator name
> - FLINK-4913: Include user jars in system class loader
> 
> PENDING REVIEW
> ===============
> - FLINK-4445: Add option to ignore unmatched state when restoring from
> savepoint => https://github.com/apache/flink/pull/2713
> - FLINK-4932: Don't let ExecutionGraph fail when in state Restarting
> => https://github.com/apache/flink/pull/2711
> - FLINK-4933: ExecutionGraph.scheduleOrUpdateConsumers can fail the
> ExecutionGraph => https://github.com/apache/flink/pull/2701
> 
> OPEN
> =====
> - FLINK-4904: Add a limit for how much data may be spilled in
> checkpoint alignments => fix pending
> - FLINK-4910: Introduce safety net for closing file system streams =>
> @Stephan, Stefan: What's the conclusion of your discussion whether to
> backport this or not?
> 
> 
> On Wed, Oct 26, 2016 at 9:57 PM, dan bress <danbr...@gmail.com> wrote:
>> +1 for this release,
>> also +1 to Chesnay's suggesting for including this: [FLINK-4875] [metrics]
>> Use correct operator name
>> 
>> Dan
>> 
>> On Wed, Oct 26, 2016 at 5:06 AM Till Rohrmann <trohrm...@apache.org> wrote:
>> 
>>> I'll work on FLINK-3347. Additionally I would like to get in
>>> 
>>> - https://issues.apache.org/jira/browse/FLINK-4932: Don't let
>>> ExecutionGraph fail when in state Restarting
>>> - https://issues.apache.org/jira/browse/FLINK-4933:
>>> ExecutionGraph.scheduleOrUpdateConsumers
>>> can fail the ExecutionGraph
>>> 
>>> Cheers,
>>> Till
>>> 
>>> On Wed, Oct 26, 2016 at 1:02 PM, Stephan Ewen <se...@apache.org> wrote:
>>> 
>>>> Concerning backporting the "I/O streams safety net" - we need to make
>>> sure
>>>> that this does not change any behavior that users may implicitly expect.
>>>> 
>>>> 
>>>> On Wed, Oct 26, 2016 at 11:21 AM, Maximilian Michels <m...@apache.org>
>>>> wrote:
>>>> 
>>>>> +1 for a 1.1.4 release
>>>>> 
>>>>> We could backport putting user jars into the system class loader for
>>>>> per-job Yarn clusters: https://github.com/apache/flink/pull/2692
>>>>> Arguably, this is somewhat a new feature but it gets rid of duplicate
>>>>> class loading issues users experienced in practice.
>>>>> 
>>>>> We already have the following commits on the release-1.1 branch:
>>>>> 
>>>>> 05a5f46 [FLINK-4862] fix Timer register in ContinuousEventTimeTrigger
>>>>> 5731672 [FLINK-4581] [table] Fix Table API throwing "No suitable driver
>>>>> found for jdbc:calcite"
>>>>> 9c87f92 [FLINK-4586] [core] Broken AverageAccumulator
>>>>> 210230c [FLINK-4829] snapshot accumulators on a best-effort basis
>>>>> c1d6b24 [FLINK-4829] protect user accumulators against concurrent
>>> updates
>>>>> fe464b4 [FLINK-4709] [core] Fix resource leak in
>>>> InputStreamFSInputWrapper
>>>>> 9f72698 [FLINK-4108] [scala] Respect ResultTypeQueryable for
>>>> InputFormats.
>>>>> 9591d50 [FLINK-4506] [DataSet] Fix documentation of CsvOutputFormat
>>> about
>>>>> incorrect default of allowNullValues
>>>>> c9433bf [FLINK-3706] Fix YARN test instability
>>>>> 2203f74 [FLINK-4778] [docs] Fix WordCount parameters in CLI examples.
>>>>> 
>>>>> -Max
>>>>> 
>>>>> 
>>>>> On Wed, Oct 26, 2016 at 7:05 AM, Jean-Baptiste Onofré <j...@nanthrax.net
>>>> 
>>>>> wrote:
>>>>>> +1
>>>>>> 
>>>>>> Looking forward this release !
>>>>>> 
>>>>>> Regards
>>>>>> JB
>>>>>> 
>>>>>> ⁣
>>>>>> 
>>>>>> On Oct 25, 2016, 14:43, at 14:43, Robert Metzger <
>>> rmetz...@apache.org>
>>>>> wrote:
>>>>>>> +1 for a bugfix release soon.
>>>>>>> 
>>>>>>> On Tue, Oct 25, 2016 at 10:53 AM, Stephan Ewen <se...@apache.org>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Thanks fort starting this Ufuk.
>>>>>>>> 
>>>>>>>> I would like to add the following issues to 1.1.4:
>>>>>>>> 
>>>>>>>> Build errors due to Storm dependencies *(fix pending)*
>>>>>>>>    - [FLINK-4298] [storm compatibility] Add proper repository for
>>>>>>> Closure
>>>>>>>> dependencies.
>>>>>>>> 
>>>>>>>> Stability on S3 considering eventual consistency *(fix pending)*
>>>>>>>>    - [FLINK-4218] [checkpoints] Do not fail checkpoints when state
>>>>>>> size
>>>>>>>> cannot be determined
>>>>>>>> 
>>>>>>>> Avoiding Zombie TaskManagers *(still needs to be done)*
>>>>>>>>    - [FLINK-3347] [akka] TaskManager (or its ActorSystem) need to
>>>>>>> restart
>>>>>>>> in case they notice quarantine
>>>>>>>> 
>>>>>>>> Adding a limit to the amount of data spilled during checkpoint
>>>>>>> alignments
>>>>>>>> *(fix
>>>>>>>> is work in progress)*
>>>>>>>>    - [FLINK-4904] [checkpoints] Add a limit for how much data may
>>> be
>>>>>>>> spilled in checkpoint alignments
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I can push the first two fixes to the 1.1.4 branch in a bit, the
>>>>>>> fourth one
>>>>>>>> later today.
>>>>>>>> The third one (akka) is still pending.
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Stephan
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Oct 24, 2016 at 3:32 PM, Ufuk Celebi <u...@apache.org>
>>> wrote:
>>>>>>>> 
>>>>>>>>> Hey all,
>>>>>>>>> 
>>>>>>>>> I would like to start the discussion for kicking off the next bug
>>>>>>> fix
>>>>>>>>> release, Flink 1.1.4. What do you think about aiming for a RC by
>>>>>>> end
>>>>>>>>> of this week?
>>>>>>>>> 
>>>>>>>>> Users reported some instabilities/inconveniences that would be
>>> good
>>>>>>> to
>>>>>>>> fix.
>>>>>>>>> 
>>>>>>>>> Personally, I would like to backport the following fixes:
>>>>>>>>> 
>>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-4619: Answer
>>>> client
>>>>>>> if
>>>>>>>>> savepoint restore fails (Already merged for master, needs minimal
>>>>>>>>> adjustment for 1.1)
>>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-4715: Safety net
>>>>>>> for
>>>>>>>>> stuck task cancellation (Already reviewed for master, waiting for
>>>>>>>>> tests to finish of backport)
>>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-4510: Always
>>>> create
>>>>>>>>> CheckpointCoordinator (Already merged for master, needs minimal
>>>>>>>>> adjustments for 1.1)
>>>>>>>>> 
>>>>>>>>> Furthermore, I would like to address the following:
>>>>>>>>> 
>>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-4445: Add option
>>>> to
>>>>>>>>> ignore unmatched state when restoring from savepoint
>>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-4894: Don't
>>> block
>>>>>>> on
>>>>>>>>> buffer request after broadcast event
>>>>>>>>> 
>>>>>>>>> Strictly speaking, the (4) is not a bug fix. But given that it
>>>>>>> would
>>>>>>>>> only add an optional flag to savepoint restoring and should have
>>>>>>> been
>>>>>>>>> addressed for 1.1.0 already, I would like to get it in.
>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>>> 
>>>

Re: [DISCUSS] Releasing Flink 1.1.4

Reply via email to