Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Tzu-Li (Gordon) Tai Thu, 16 Mar 2017 23:14:50 -0700

Update for 1.1.5:
The last fixes for 1.1.5 are in! I will create the RC today and start the vote.


Cheers,
Gordon


On March 17, 2017 at 1:14:53 AM, Robert Metzger ([email protected]) wrote:

The cassandra connector is probably not usable in Flink 1.2.0. I would like  
to include a fix in 1.2.1:  
https://issues.apache.org/jira/browse/FLINK-6084  

Please let me know if this fix becomes a blocker for the 1.2.1 release. If  
so, I can validate the fix myself to speed up things.  

On Thu, Mar 16, 2017 at 9:41 AM, Jinkui Shi <[email protected]> wrote:  

> @Tzu-li(Fordon)Tai  
>  
> FLINK-5650 is fix by [1]. Chesnay Scheduler push a PR please.  
>  
> [1] https://github.com/zentol/flink/tree/5650_python_test_debug <  
> https://github.com/zentol/flink/tree/5650_python_test_debug>  
>  
>  
> > 在 2017年3月16日，上午3:37，Stephan Ewen <[email protected]> 写道：  
> >  
> > Thanks for the update!  
> >  
> > Just merged to 1.2.1 also: [FLINK-5962] [checkpoints] Remove scheduled  
> > cancel-task from timer queue to prevent memory leaks  
> >  
> > The remaining issue list looks good, but I would say that (5) is  
> optional.  
> > It is not a critical production bug.  
> >  
> >  
> >  
> > On Wed, Mar 15, 2017 at 5:38 PM, Tzu-Li (Gordon) Tai <  
> [email protected]>  
> > wrote:  
> >  
> >> Thanks a lot for the updates so far everyone!  
> >>  
> >> From the discussion so far, the below is the still unfixed pending  
> issues  
> >> for 1.1.5 / 1.2.1 release.  
> >>  
> >> Since there’s only one backport for 1.1.5 left, I think having an RC for  
> >> 1.1.5 near the end of this week / early next week is very promising, as  
> >> basically everything is already in.  
> >> I’d be happy to volunteer to help manage the release for 1.1.5, and  
> >> prepare the RC when it’s ready :)  
> >>  
> >> For 1.2.1, we can leave the pending list here for tracking, and come  
> back  
> >> to update it in the near future.  
> >>  
> >> If there’s anything I missed, please let me know!  
> >>  
> >>  
> >> =========== Still pending for Flink 1.1.5 ===========  
> >>  
> >> (1) https://issues.apache.org/jira/browse/FLINK-5701  
> >> Broken at-least-once Kafka producer.  
> >> Status: backport PR pending - https://github.com/apache/flink/pull/3549  
> .  
> >> Since it is a relatively self-contained change, I expect this to be a  
> fast  
> >> fix.  
> >>  
> >>  
> >>  
> >> =========== Still pending for Flink 1.2.1 ===========  
> >>  
> >> (1) https://issues.apache.org/jira/browse/FLINK-5808  
> >> Fix Missing verification for setParallelism and setMaxParallelism  
> >> Status: PR - https://github.com/apache/flink/pull/3509, review in  
> progress  
> >>  
> >> (2) https://issues.apache.org/jira/browse/FLINK-5713  
> >> Protect against NPE in WindowOperator window cleanup  
> >> Status: PR - https://github.com/apache/flink/pull/3535, review pending  
> >>  
> >> (3) https://issues.apache.org/jira/browse/FLINK-6044  
> >> TypeSerializerSerializationProxy.read() doesn't verify the read buffer  
> >> length  
> >> Status: Fixed for master, 1.2 backport pending  
> >>  
> >> (4) https://issues.apache.org/jira/browse/FLINK-5985  
> >> Flink treats every task as stateful (making topology changes impossible)  
> >> Status: PR - https://github.com/apache/flink/pull/3543, review in  
> progress  
> >>  
> >> (5) https://issues.apache.org/jira/browse/FLINK-5650  
> >> Flink-python tests taking up too much time  
> >> Status: I think Chesnay currently has some progress with this one, we  
> can  
> >> see if we want to make this a blocker  
> >>  
> >>  
> >> Cheers,  
> >> Gordon  
> >>  
> >> On March 15, 2017 at 7:16:53 PM, Jinkui Shi ([email protected])  
> wrote:  
> >>  
> >> Can we fix this issue in the 1.2.1:  
> >>  
> >> Flink-python tests cost too long time  
> >> https://issues.apache.org/jira/browse/FLINK-5650 <  
> >> https://issues.apache.org/jira/browse/FLINK-5650>  
> >>  
> >>> 在 2017年3月15日，下午6:29，Vladislav Pernin <[email protected]> 写道：  
> >>>  
> >>> I just tested in in my reproducer. It works.  
> >>>  
> >>> 2017-03-15 11:22 GMT+01:00 Aljoscha Krettek <[email protected]>:  
> >>>  
> >>>> I did in fact just open a PR for  
> >>>>> https://issues.apache.org/jira/browse/FLINK-6001  
> >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and  
> >>>>> allowedLateness  
> >>>>  
> >>>>  
> >>>> On Tue, Mar 14, 2017, at 18:20, Vladislav Pernin wrote:  
> >>>>> Hi,  
> >>>>>  
> >>>>> I would also include the following (not yet resolved) issue in the  
> >> 1.2.1  
> >>>>> scope :  
> >>>>>  
> >>>>> https://issues.apache.org/jira/browse/FLINK-6001  
> >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and  
> >>>>> allowedLateness  
> >>>>>  
> >>>>> 2017-03-14 17:34 GMT+01:00 Ufuk Celebi <[email protected]>:  
> >>>>>  
> >>>>>> Big +1 Gordon!  
> >>>>>>  
> >>>>>> I think (10) is very critical to have in 1.2.1.  
> >>>>>>  
> >>>>>> – Ufuk  
> >>>>>>  
> >>>>>>  
> >>>>>> On Tue, Mar 14, 2017 at 3:37 PM, Stefan Richter  
> >>>>>> <[email protected]> wrote:  
> >>>>>>> Hi,  
> >>>>>>>  
> >>>>>>> I would suggest to also include in 1.2.1:  
> >>>>>>>  
> >>>>>>> (9) https://issues.apache.org/jira/browse/FLINK-6044 <  
> >>>>>> https://issues.apache.org/jira/browse/FLINK-6044>  
> >>>>>>> Replaces unintentional calls to InputStream#read(…) with the  
> intended  
> >>>>>>> and correct InputStream#readFully(…)  
> >>>>>>> Status: PR  
> >>>>>>>  
> >>>>>>> (10) https://issues.apache.org/jira/browse/FLINK-5985 <  
> >>>>>> https://issues.apache.org/jira/browse/FLINK-5985>  
> >>>>>>> Flink 1.2 was creating state handles for stateless tasks which  
> caused  
> >>>>>> trouble  
> >>>>>>> at restore time for users that wanted to do some changes that only  
> >>>>>> include  
> >>>>>>> stateless operators to their topology.  
> >>>>>>> Status: PR  
> >>>>>>>  
> >>>>>>>  
> >>>>>>>> Am 14.03.2017 um 15:15 schrieb Till Rohrmann <  
> [email protected]  
> >>>>> :  
> >>>>>>>>  
> >>>>>>>> Thanks for kicking off the discussion Tzu-Li. I'd like to add the  
> >>>>>> following  
> >>>>>>>> issues which have already been merged into the 1.2-release and  
> >>>>>> 1.1-release  
> >>>>>>>> branch:  
> >>>>>>>>  
> >>>>>>>> 1.2.1:  
> >>>>>>>>  
> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942  
> >>>>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper  
> data.  
> >>>>>>>> Corrupted checkpoints will now be skipped.  
> >>>>>>>> Status: Merged  
> >>>>>>>>  
> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940  
> >>>>>>>> Hardens the checkpoint recovery in case that we cannot retrieve  
> the  
> >>>>>>>> completed checkpoint from the meta data state handle retrieved  
> from  
> >>>>>>>> ZooKeeper. This can, for example, happen if the meta data is  
> >>>> deleted.  
> >>>>>>>> Checkpoints with unretrievable state handles are skipped.  
> >>>>>>>> Status: Merged  
> >>>>>>>>  
> >>>>>>>> 1.1.5:  
> >>>>>>>>  
> >>>>>>>>  
> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942  
> >>>>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper  
> data.  
> >>>>>>>> Corrupted checkpoints will now be skipped.  
> >>>>>>>> Status: Merged  
> >>>>>>>>  
> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940  
> >>>>>>>> Hardens the checkpoint recovery in case that we cannot retrieve  
> the  
> >>>>>>>> completed checkpoint from the meta data state handle retrieved  
> from  
> >>>>>>>> ZooKeeper. This can, for example, happen if the meta data is  
> >>>> deleted.  
> >>>>>>>> Checkpoints with unretrievable state handles are skipped.  
> >>>>>>>> Status: Merged  
> >>>>>>>>  
> >>>>>>>> Cheers,  
> >>>>>>>> Till  
> >>>>>>>>  
> >>>>>>>> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai <  
> >>>>>> [email protected]>  
> >>>>>>>> wrote:  
> >>>>>>>>  
> >>>>>>>>> Hi all!  
> >>>>>>>>>  
> >>>>>>>>> I would like to start a discussion for the next bugfix release  
> for  
> >>>>>> 1.1.x  
> >>>>>>>>> and 1.2.x.  
> >>>>>>>>> There’s been quite a few critical fixes for bugs in both the  
> >>>> releases  
> >>>>>>>>> recently, and I think they deserve a bugfix release soon.  
> >>>>>>>>> Most of the bugs were reported by users.  
> >>>>>>>>>  
> >>>>>>>>> I’m starting the discussion for both bugfix releases because most  
> >>>> fixes  
> >>>>>>>>> span both releases (almost identical).  
> >>>>>>>>> Of course, the actual RC votes and RC creation process doesn’t  
> >>>> have to  
> >>>>>> be  
> >>>>>>>>> started together.  
> >>>>>>>>>  
> >>>>>>>>> Here’s an overview of what’s been collected so far, for both  
> bugfix  
> >>>>>>>>> releases -  
> >>>>>>>>> (it’s a list of what I’m aware of so far, and may be missing  
> stuff;  
> >>>>>> please  
> >>>>>>>>> append and bring to attention as necessary :-) )  
> >>>>>>>>>  
> >>>>>>>>>  
> >>>>>>>>> For Flink 1.2.1:  
> >>>>>>>>>  
> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:  
> >>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on  
> >>>>>> checkpoints.  
> >>>>>>>>> This compromises the producer’s at-least-once guarantee.  
> >>>>>>>>> Status: merged  
> >>>>>>>>>  
> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-5949:  
> >>>>>>>>> Do not check Kerberos credentials for non-Kerberos  
> authentications.  
> >>>>>> MapR  
> >>>>>>>>> users are affected by this, and cannot submit Flink on YARN jobs  
> >>>> on a  
> >>>>>>>>> secured MapR cluster.  
> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3528, one +1  
> >>>> already  
> >>>>>>>>>  
> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6006:  
> >>>>>>>>> Kafka Consumer can lose state if queried partition list is  
> >>>> incomplete  
> >>>>>> on  
> >>>>>>>>> restore.  
> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3505, one +1  
> >>>> already  
> >>>>>>>>>  
> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-6025:  
> >>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s  
> >>>>>> JavaSerializer is  
> >>>>>>>>> used.  
> >>>>>>>>> Status: merged  
> >>>>>>>>>  
> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5771:  
> >>>>>>>>> Fix multi-char delimiters in Batch InputFormats.  
> >>>>>>>>> Status: merged  
> >>>>>>>>>  
> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5934:  
> >>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This  
> >>>>>> fixes a  
> >>>>>>>>> bug that causes HA recovery to fail.  
> >>>>>>>>> Status: merged  
> >>>>>>>>>  
> >>>>>>>>>  
> >>>>>>>>>  
> >>>>>>>>> For Flink 1.1.5:  
> >>>>>>>>>  
> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:  
> >>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on  
> >>>>>> checkpoints.  
> >>>>>>>>> This compromises the producer’s at-least-once guarantee.  
> >>>>>>>>> Status: This is already merged for 1.2.1. I would personally like  
> >>>> to  
> >>>>>>>>> backport the fix for this to 1.1.5 also.  
> >>>>>>>>>  
> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-6006:  
> >>>>>>>>> Kafka Consumer can lose state if queried partition list is  
> >>>> incomplete  
> >>>>>> on  
> >>>>>>>>> restore.  
> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3507, one +1  
> >>>> already  
> >>>>>>>>>  
> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6025:  
> >>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s  
> >>>>>> JavaSerializer is  
> >>>>>>>>> used.  
> >>>>>>>>> Status: merged  
> >>>>>>>>>  
> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-5771:  
> >>>>>>>>> Fix multi-char delimiters in Batch InputFormats.  
> >>>>>>>>> Status: merged  
> >>>>>>>>>  
> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5934:  
> >>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This  
> >>>>>> fixes a  
> >>>>>>>>> bug that causes HA recovery to fail.  
> >>>>>>>>> Status: merged  
> >>>>>>>>>  
> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5048:  
> >>>>>>>>> Kafka Consumer (0.9/0.10) threading model leads problematic  
> >>>>>> cancellation  
> >>>>>>>>> behavior.  
> >>>>>>>>> Status: This fix was already released in 1.2.0, but never made it  
> >>>> into  
> >>>>>> the  
> >>>>>>>>> 1.1.x bugfixes. Do we want to backport this also for 1.1.5?  
> >>>>>>>>>  
> >>>>>>>>>  
> >>>>>>>>> What do you think? From the list so far, we pretty much already  
> >>>> have  
> >>>>>>>>> everything in, so I think it would be nice to aim for RCs by the  
> >>>> end of  
> >>>>>>>>> this week.  
> >>>>>>>>> Since both bugfix releases cover almost the same list of issues,  
> I  
> >>>>>> think  
> >>>>>>>>> it shouldn’t be too hard for us to kick off both bugfix releases  
> >>>>>> around the  
> >>>>>>>>> same time.  
> >>>>>>>>>  
> >>>>>>>>> Also FYI, here’s the lists of JIRA tickets tagged with "1.2.1” /  
> >>>>>> “1.1.5”  
> >>>>>>>>> as the Fix Versions, and are still open.  
> >>>>>>>>> We should probably want to check if there’s anything on there  
> that  
> >>>> we  
> >>>>>>>>> should block on for the releases:  
> >>>>>>>>>  
> >>>>>>>>> For 1.2.1:  
> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-5711?jql=  
> >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%  
> >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%  
> 20fixVersion%20%3D%201.2.1  
> >>>>>>>>>  
> >>>>>>>>> For 1.1.5:  
> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-6006?jql=  
> >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%  
> >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%  
> 20fixVersion%20%3D%201.1.5  
> >>>>>>>  
> >>>>>>  
> >>>>  
> >>>  
> >>  
> >>  
>  
>

Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Reply via email to