I have found another problem: Under certain circumstances Flink can lose state data by completing an invalid checkpoint. https://issues.apache.org/jira/browse/FLINK-5667.
Cheers, Till On Thu, Jan 26, 2017 at 6:27 PM, Till Rohrmann <trohrm...@apache.org> wrote: > Robert also found an issue that pending checkpoint files are not properly > cleaned up: https://issues.apache.org/jira/browse/FLINK-5660. To my > surprise, the issue was already fixed in 1.1.4 so I guess I've forgotten to > forward port the fix. There is a pending PR to fix it. The fix could also > be part of a 1.2.1 release. > > Cheers, > Till > > On Thu, Jan 26, 2017 at 6:04 PM, Ufuk Celebi <u...@apache.org> wrote: > >> I ran some tests and found the following issues: >> >> https://issues.apache.org/jira/browse/FLINK-5663: Checkpoint fails >> because of closed registry >> => This happened a couple of times for the first checkpoints after >> submitting a job. If it happened on every submission I would >> definitely make this a blocker, but I happen to run into it in like 3 >> out of 10 job submission. What do we make of this? >> >> https://issues.apache.org/jira/browse/FLINK-5665: When the failures >> happened, I also had some lingering 0-byte files. >> >> https://issues.apache.org/jira/browse/FLINK-5664: I also found the >> logging of the RocksDB backend a little noisy (for my local setup at >> least with many tasks per TM and low checkpointing interval.) >> >> All in all, I'm not sure if we want to make these a blocker or not. >> I'm fine both ways with a follow up 1.2.1 release. >> >> === >> >> - Verified signatures and checksums >> - Checked out the Java quickstarts and ran the jobs >> - All poms point to 1.2.0 >> - Migrated multiple jobs via savepoint from 1.1.4 to 1.2.0 with Kryo >> types, session windows (w/o lateness), operator and keyed state for >> all three backends >> - Rescaled the same jobs from 1.2.0 savepoints with all three backends >> - Verified the "migration namespace serializer" fix >> - Ran streaming state machine with Kafka source, RocksDB backend and >> master and worker failures (standalone cluster) >> >> On Wed, Jan 25, 2017 at 9:14 PM, Robert Metzger <rmetz...@apache.org> >> wrote: >> > Dear Flink community, >> > >> > Please vote on releasing the following candidate as Apache Flink version >> > 1.2.0. >> > >> > The commit to be voted on: >> > 8b5b6a8b (http://git-wip-us.apache.org/repos/asf/flink/commit/8b5b6a8b) >> > >> > Branch: >> > release-1.2.0-rc2 >> > (https://git1-us-west.apache.org/repos/asf/flink/repo?p=flin >> > k.git;a=shortlog;h=refs/heads/release-1.2.0-rc2) >> > >> > The release artifacts to be voted on can be found at: >> > *http://people.apache.org/~rmetzger/flink-1.2.0-rc2/ >> > <http://people.apache.org/~rmetzger/flink-1.2.0-rc2/>* >> > >> > The release artifacts are signed with the key with fingerprint D9839159: >> > http://www.apache.org/dist/flink/KEYS >> > >> > The staging repository for this release can be found at: >> > *https://repository.apache.org/content/repositories/orgapacheflink-1113 >> > <https://repository.apache.org/content/repositories/orgapacheflink-1113 >> >* >> > >> > ------------------------------------------------------------- >> > >> > I would like to keep Friday as the target release time. Please let me >> know >> > if you want me to move the deadline to Monday if you need more time of >> the >> > testing. >> > >> > The vote ends on Friday, January 27, 2017, 6pm CET. >> > >> > Please test the release rather now than on Friday morning, to be able to >> > cancel it as early as possible. >> > For making the testing easier, I've created this document to track what >> has >> > already been tested and what needs to be tested: https://docs.google.co >> > m/document/d/1MX-8l9RrLly3UmZMODHBnuZUrK_n-DGIBLjFKyCrTAs/ >> edit?usp=sharing >> > Feel free to add more tests or change existing ones. >> > >> > [ ] +1 Release this package as Apache Flink 1.2.0 >> > [ ] -1 Do not release this package, because ... >> > >