Thank you for testing the RC Gyula! Regarding the other reported JIRAs:
These issues are resolved: https://issues.apache.org/jira/browse/FLINK-5670 (merged) https://issues.apache.org/jira/browse/FLINK-5667 (merged) https://issues.apache.org/jira/browse/FLINK-5660 (merged) https://issues.apache.org/jira/browse/FLINK-5665 (duplicate) https://issues.apache.org/jira/browse/FLINK-5664 (wontfix) Unresolved: https://issues.apache.org/jira/browse/FLINK-5663 (pending PR merge) I'll create RC3 once FLINK-5663 has been merged to the "release-1.2" branch, so that we can start testing and voting on Monday morning again. On Fri, Jan 27, 2017 at 1:26 PM, Aljoscha Krettek <aljos...@apache.org> wrote: > I think this issue that Ufuk opened is also a blocker: > https://issues.apache.org/jira/browse/FLINK-5670 > > As I comment in the Issue, at least one bigger user of Flink has run into > this problem on their cluster. > > On Fri, 27 Jan 2017 at 10:50 Ufuk Celebi <u...@apache.org> wrote: > > > Thanks Gyula! > > > > The current state of things is: > > - Stefan is working on a fix for > > https://issues.apache.org/jira/browse/FLINK-5663. > > - Till is working on https://issues.apache.org/jira/browse/FLINK-5667. > > > > As far as I can tell, these will be fixed today and we are ready to go > for > > RC3. > > > > I resolved the other issues I created. > > > > – Ufuk > > > > On 26 January 2017 at 22:16:26, Gyula Fóra (gyf...@apache.org) wrote: > > > Hi, > > > > > > Aside from the issues mentioned above I have some good news as well. > > > > > > I have finished porting and started testing one of our major production > > > jobs (RBea) on 1.2 and everything seems to run well so far, with > > > savepoints, rescaling, externalized checkpoints, metrics etc. on YARN. > > > > > > In this job I use, windowing, RocksDB state, iterations, timers, > > broadcast > > > states, repartitionable operator states etc. and everything seems to be > > > working extremely well under normal circumstances. > > > > > > So far I mostly ran sunny day tests but I will continue testing with > > larger > > > load and some failure scenarios. I will keep you posted. > > > > > > Great job! > > > Gyula > > > > > > > > > > > > Robert Metzger ezt írta (időpont: 2017. jan. 26., Cs, > > > 21:28): > > > > > > Damn. I really hoped that this RC goes through. > > > > > > I propose to keep the RC2 open until we've fixed all issues mentioned > > here > > > and to get some more testing feedback. > > > > > > > > > > > > On Thu, Jan 26, 2017 at 8:06 PM, Stephan Ewen wrote: > > > > > > > @Till - I think that FLINK-5667 is a blocker > > > > > > > > Good catch finding it! > > > > > > > > On Thu, Jan 26, 2017 at 7:51 PM, Till Rohrmann > > > > wrote: > > > > > > > > > I have found another problem: Under certain circumstances Flink can > > lose > > > > > state data by completing an invalid checkpoint. > > > > > https://issues.apache.org/jira/browse/FLINK-5667. > > > > > > > > > > Cheers, > > > > > Till > > > > > > > > > > On Thu, Jan 26, 2017 at 6:27 PM, Till Rohrmann > > > > > wrote: > > > > > > > > > > > Robert also found an issue that pending checkpoint files are not > > > > properly > > > > > > cleaned up: https://issues.apache.org/jira/browse/FLINK-5660. To > > my > > > > > > surprise, the issue was already fixed in 1.1.4 so I guess I've > > > > forgotten > > > > > to > > > > > > forward port the fix. There is a pending PR to fix it. The fix > > could > > > > also > > > > > > be part of a 1.2.1 release. > > > > > > > > > > > > Cheers, > > > > > > Till > > > > > > > > > > > > On Thu, Jan 26, 2017 at 6:04 PM, Ufuk Celebi wrote: > > > > > > > > > > > >> I ran some tests and found the following issues: > > > > > >> > > > > > >> https://issues.apache.org/jira/browse/FLINK-5663: Checkpoint > > fails > > > > > >> because of closed registry > > > > > >> => This happened a couple of times for the first checkpoints > after > > > > > >> submitting a job. If it happened on every submission I would > > > > > >> definitely make this a blocker, but I happen to run into it in > > like 3 > > > > > >> out of 10 job submission. What do we make of this? > > > > > >> > > > > > >> https://issues.apache.org/jira/browse/FLINK-5665: When the > > failures > > > > > >> happened, I also had some lingering 0-byte files. > > > > > >> > > > > > >> https://issues.apache.org/jira/browse/FLINK-5664: I also found > > the > > > > > >> logging of the RocksDB backend a little noisy (for my local > setup > > at > > > > > >> least with many tasks per TM and low checkpointing interval.) > > > > > >> > > > > > >> All in all, I'm not sure if we want to make these a blocker or > > not. > > > > > >> I'm fine both ways with a follow up 1.2.1 release. > > > > > >> > > > > > >> === > > > > > >> > > > > > >> - Verified signatures and checksums > > > > > >> - Checked out the Java quickstarts and ran the jobs > > > > > >> - All poms point to 1.2.0 > > > > > >> - Migrated multiple jobs via savepoint from 1.1.4 to 1.2.0 with > > Kryo > > > > > >> types, session windows (w/o lateness), operator and keyed state > > for > > > > > >> all three backends > > > > > >> - Rescaled the same jobs from 1.2.0 savepoints with all three > > > backends > > > > > >> - Verified the "migration namespace serializer" fix > > > > > >> - Ran streaming state machine with Kafka source, RocksDB backend > > and > > > > > >> master and worker failures (standalone cluster) > > > > > >> > > > > > >> On Wed, Jan 25, 2017 at 9:14 PM, Robert Metzger > > > > > >> wrote: > > > > > >> > Dear Flink community, > > > > > >> > > > > > > >> > Please vote on releasing the following candidate as Apache > Flink > > > > > version > > > > > >> > 1.2.0. > > > > > >> > > > > > > >> > The commit to be voted on: > > > > > >> > 8b5b6a8b (http://git-wip-us.apache.org/ > repos/asf/flink/commit/ > > > > > 8b5b6a8b) > > > > > >> > > > > > > >> > Branch: > > > > > >> > release-1.2.0-rc2 > > > > > >> > (https://git1-us-west.apache.org/repos/asf/flink/repo?p=flin > > > > > >> > k.git;a=shortlog;h=refs/heads/release-1.2.0-rc2) > > > > > >> > > > > > > >> > The release artifacts to be voted on can be found at: > > > > > >> > *http://people.apache.org/~rmetzger/flink-1.2.0-rc2/ > > > > > >> > * > > > > > >> > > > > > > >> > The release artifacts are signed with the key with fingerprint > > > > > D9839159: > > > > > >> > http://www.apache.org/dist/flink/KEYS > > > > > >> > > > > > > >> > The staging repository for this release can be found at: > > > > > >> > *https://repository.apache.org/content/repositories/ > > > > > orgapacheflink-1113 > > > > > >> > > > > orgapacheflink-1113 > > > > > >> >* > > > > > >> > > > > > > >> > ------------------------------------------------------------- > > > > > >> > > > > > > >> > I would like to keep Friday as the target release time. Please > > let > > > > me > > > > > >> know > > > > > >> > if you want me to move the deadline to Monday if you need more > > time > > > > of > > > > > >> the > > > > > >> > testing. > > > > > >> > > > > > > >> > The vote ends on Friday, January 27, 2017, 6pm CET. > > > > > >> > > > > > > >> > Please test the release rather now than on Friday morning, to > be > > > > able > > > > > to > > > > > >> > cancel it as early as possible. > > > > > >> > For making the testing easier, I've created this document to > > track > > > > > what > > > > > >> has > > > > > >> > already been tested and what needs to be tested: > > > > > https://docs.google.co > > > > > >> > m/document/d/1MX-8l9RrLly3UmZMODHBnuZUrK_n-DGIBLjFKyCrTAs/ > > > > > >> edit?usp=sharing > > > > > >> > Feel free to add more tests or change existing ones. > > > > > >> > > > > > > >> > [ ] +1 Release this package as Apache Flink 1.2.0 > > > > > >> > [ ] -1 Do not release this package, because ... > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >