@Tzu-li(Fordon)Tai FLINK-5650 is fix by [1]. Chesnay Scheduler push a PR please.
[1] https://github.com/zentol/flink/tree/5650_python_test_debug <https://github.com/zentol/flink/tree/5650_python_test_debug> > 在 2017年3月16日,上午3:37,Stephan Ewen <se...@apache.org> 写道: > > Thanks for the update! > > Just merged to 1.2.1 also: [FLINK-5962] [checkpoints] Remove scheduled > cancel-task from timer queue to prevent memory leaks > > The remaining issue list looks good, but I would say that (5) is optional. > It is not a critical production bug. > > > > On Wed, Mar 15, 2017 at 5:38 PM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> > wrote: > >> Thanks a lot for the updates so far everyone! >> >> From the discussion so far, the below is the still unfixed pending issues >> for 1.1.5 / 1.2.1 release. >> >> Since there’s only one backport for 1.1.5 left, I think having an RC for >> 1.1.5 near the end of this week / early next week is very promising, as >> basically everything is already in. >> I’d be happy to volunteer to help manage the release for 1.1.5, and >> prepare the RC when it’s ready :) >> >> For 1.2.1, we can leave the pending list here for tracking, and come back >> to update it in the near future. >> >> If there’s anything I missed, please let me know! >> >> >> =========== Still pending for Flink 1.1.5 =========== >> >> (1) https://issues.apache.org/jira/browse/FLINK-5701 >> Broken at-least-once Kafka producer. >> Status: backport PR pending - https://github.com/apache/flink/pull/3549. >> Since it is a relatively self-contained change, I expect this to be a fast >> fix. >> >> >> >> =========== Still pending for Flink 1.2.1 =========== >> >> (1) https://issues.apache.org/jira/browse/FLINK-5808 >> Fix Missing verification for setParallelism and setMaxParallelism >> Status: PR - https://github.com/apache/flink/pull/3509, review in progress >> >> (2) https://issues.apache.org/jira/browse/FLINK-5713 >> Protect against NPE in WindowOperator window cleanup >> Status: PR - https://github.com/apache/flink/pull/3535, review pending >> >> (3) https://issues.apache.org/jira/browse/FLINK-6044 >> TypeSerializerSerializationProxy.read() doesn't verify the read buffer >> length >> Status: Fixed for master, 1.2 backport pending >> >> (4) https://issues.apache.org/jira/browse/FLINK-5985 >> Flink treats every task as stateful (making topology changes impossible) >> Status: PR - https://github.com/apache/flink/pull/3543, review in progress >> >> (5) https://issues.apache.org/jira/browse/FLINK-5650 >> Flink-python tests taking up too much time >> Status: I think Chesnay currently has some progress with this one, we can >> see if we want to make this a blocker >> >> >> Cheers, >> Gordon >> >> On March 15, 2017 at 7:16:53 PM, Jinkui Shi (shijinkui...@163.com) wrote: >> >> Can we fix this issue in the 1.2.1: >> >> Flink-python tests cost too long time >> https://issues.apache.org/jira/browse/FLINK-5650 < >> https://issues.apache.org/jira/browse/FLINK-5650> >> >>> 在 2017年3月15日,下午6:29,Vladislav Pernin <vladislav.per...@gmail.com> 写道: >>> >>> I just tested in in my reproducer. It works. >>> >>> 2017-03-15 11:22 GMT+01:00 Aljoscha Krettek <aljos...@apache.org>: >>> >>>> I did in fact just open a PR for >>>>> https://issues.apache.org/jira/browse/FLINK-6001 >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and >>>>> allowedLateness >>>> >>>> >>>> On Tue, Mar 14, 2017, at 18:20, Vladislav Pernin wrote: >>>>> Hi, >>>>> >>>>> I would also include the following (not yet resolved) issue in the >> 1.2.1 >>>>> scope : >>>>> >>>>> https://issues.apache.org/jira/browse/FLINK-6001 >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and >>>>> allowedLateness >>>>> >>>>> 2017-03-14 17:34 GMT+01:00 Ufuk Celebi <u...@apache.org>: >>>>> >>>>>> Big +1 Gordon! >>>>>> >>>>>> I think (10) is very critical to have in 1.2.1. >>>>>> >>>>>> – Ufuk >>>>>> >>>>>> >>>>>> On Tue, Mar 14, 2017 at 3:37 PM, Stefan Richter >>>>>> <s.rich...@data-artisans.com> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I would suggest to also include in 1.2.1: >>>>>>> >>>>>>> (9) https://issues.apache.org/jira/browse/FLINK-6044 < >>>>>> https://issues.apache.org/jira/browse/FLINK-6044> >>>>>>> Replaces unintentional calls to InputStream#read(…) with the intended >>>>>>> and correct InputStream#readFully(…) >>>>>>> Status: PR >>>>>>> >>>>>>> (10) https://issues.apache.org/jira/browse/FLINK-5985 < >>>>>> https://issues.apache.org/jira/browse/FLINK-5985> >>>>>>> Flink 1.2 was creating state handles for stateless tasks which caused >>>>>> trouble >>>>>>> at restore time for users that wanted to do some changes that only >>>>>> include >>>>>>> stateless operators to their topology. >>>>>>> Status: PR >>>>>>> >>>>>>> >>>>>>>> Am 14.03.2017 um 15:15 schrieb Till Rohrmann <trohrm...@apache.org >>>>> : >>>>>>>> >>>>>>>> Thanks for kicking off the discussion Tzu-Li. I'd like to add the >>>>>> following >>>>>>>> issues which have already been merged into the 1.2-release and >>>>>> 1.1-release >>>>>>>> branch: >>>>>>>> >>>>>>>> 1.2.1: >>>>>>>> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942 >>>>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper data. >>>>>>>> Corrupted checkpoints will now be skipped. >>>>>>>> Status: Merged >>>>>>>> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940 >>>>>>>> Hardens the checkpoint recovery in case that we cannot retrieve the >>>>>>>> completed checkpoint from the meta data state handle retrieved from >>>>>>>> ZooKeeper. This can, for example, happen if the meta data is >>>> deleted. >>>>>>>> Checkpoints with unretrievable state handles are skipped. >>>>>>>> Status: Merged >>>>>>>> >>>>>>>> 1.1.5: >>>>>>>> >>>>>>>> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942 >>>>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper data. >>>>>>>> Corrupted checkpoints will now be skipped. >>>>>>>> Status: Merged >>>>>>>> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940 >>>>>>>> Hardens the checkpoint recovery in case that we cannot retrieve the >>>>>>>> completed checkpoint from the meta data state handle retrieved from >>>>>>>> ZooKeeper. This can, for example, happen if the meta data is >>>> deleted. >>>>>>>> Checkpoints with unretrievable state handles are skipped. >>>>>>>> Status: Merged >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Till >>>>>>>> >>>>>>>> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai < >>>>>> tzuli...@apache.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi all! >>>>>>>>> >>>>>>>>> I would like to start a discussion for the next bugfix release for >>>>>> 1.1.x >>>>>>>>> and 1.2.x. >>>>>>>>> There’s been quite a few critical fixes for bugs in both the >>>> releases >>>>>>>>> recently, and I think they deserve a bugfix release soon. >>>>>>>>> Most of the bugs were reported by users. >>>>>>>>> >>>>>>>>> I’m starting the discussion for both bugfix releases because most >>>> fixes >>>>>>>>> span both releases (almost identical). >>>>>>>>> Of course, the actual RC votes and RC creation process doesn’t >>>> have to >>>>>> be >>>>>>>>> started together. >>>>>>>>> >>>>>>>>> Here’s an overview of what’s been collected so far, for both bugfix >>>>>>>>> releases - >>>>>>>>> (it’s a list of what I’m aware of so far, and may be missing stuff; >>>>>> please >>>>>>>>> append and bring to attention as necessary :-) ) >>>>>>>>> >>>>>>>>> >>>>>>>>> For Flink 1.2.1: >>>>>>>>> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701: >>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on >>>>>> checkpoints. >>>>>>>>> This compromises the producer’s at-least-once guarantee. >>>>>>>>> Status: merged >>>>>>>>> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-5949: >>>>>>>>> Do not check Kerberos credentials for non-Kerberos authentications. >>>>>> MapR >>>>>>>>> users are affected by this, and cannot submit Flink on YARN jobs >>>> on a >>>>>>>>> secured MapR cluster. >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3528, one +1 >>>> already >>>>>>>>> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6006: >>>>>>>>> Kafka Consumer can lose state if queried partition list is >>>> incomplete >>>>>> on >>>>>>>>> restore. >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3505, one +1 >>>> already >>>>>>>>> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-6025: >>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s >>>>>> JavaSerializer is >>>>>>>>> used. >>>>>>>>> Status: merged >>>>>>>>> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5771: >>>>>>>>> Fix multi-char delimiters in Batch InputFormats. >>>>>>>>> Status: merged >>>>>>>>> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5934: >>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This >>>>>> fixes a >>>>>>>>> bug that causes HA recovery to fail. >>>>>>>>> Status: merged >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> For Flink 1.1.5: >>>>>>>>> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701: >>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on >>>>>> checkpoints. >>>>>>>>> This compromises the producer’s at-least-once guarantee. >>>>>>>>> Status: This is already merged for 1.2.1. I would personally like >>>> to >>>>>>>>> backport the fix for this to 1.1.5 also. >>>>>>>>> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-6006: >>>>>>>>> Kafka Consumer can lose state if queried partition list is >>>> incomplete >>>>>> on >>>>>>>>> restore. >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3507, one +1 >>>> already >>>>>>>>> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6025: >>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s >>>>>> JavaSerializer is >>>>>>>>> used. >>>>>>>>> Status: merged >>>>>>>>> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-5771: >>>>>>>>> Fix multi-char delimiters in Batch InputFormats. >>>>>>>>> Status: merged >>>>>>>>> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5934: >>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This >>>>>> fixes a >>>>>>>>> bug that causes HA recovery to fail. >>>>>>>>> Status: merged >>>>>>>>> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5048: >>>>>>>>> Kafka Consumer (0.9/0.10) threading model leads problematic >>>>>> cancellation >>>>>>>>> behavior. >>>>>>>>> Status: This fix was already released in 1.2.0, but never made it >>>> into >>>>>> the >>>>>>>>> 1.1.x bugfixes. Do we want to backport this also for 1.1.5? >>>>>>>>> >>>>>>>>> >>>>>>>>> What do you think? From the list so far, we pretty much already >>>> have >>>>>>>>> everything in, so I think it would be nice to aim for RCs by the >>>> end of >>>>>>>>> this week. >>>>>>>>> Since both bugfix releases cover almost the same list of issues, I >>>>>> think >>>>>>>>> it shouldn’t be too hard for us to kick off both bugfix releases >>>>>> around the >>>>>>>>> same time. >>>>>>>>> >>>>>>>>> Also FYI, here’s the lists of JIRA tickets tagged with "1.2.1” / >>>>>> “1.1.5” >>>>>>>>> as the Fix Versions, and are still open. >>>>>>>>> We should probably want to check if there’s anything on there that >>>> we >>>>>>>>> should block on for the releases: >>>>>>>>> >>>>>>>>> For 1.2.1: >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-5711?jql= >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20% >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.2.1 >>>>>>>>> >>>>>>>>> For 1.1.5: >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-6006?jql= >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20% >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.1.5 >>>>>>> >>>>>> >>>> >>> >> >>