Given that there might still be users who want to upgrade and, thus, are reading the release notes for the 1.12.0 release, I would also update the 1.12.0 release notes. Moreover, I would consider the risk of losing state because unaligned checkpoints might break recovery as quite a serious problem. Imagine a user is running a production job which does not need to recover for some time because of some lucky coincidence and then all of a sudden Flink fails fatally with the first job failure. I would even argue that such a problem would warrant a documentation update where we add a warning box to [1] which states the current limitations. Of course, this only holds true under the assumption that this is indeed a real problem and not a test instability.
[1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/checkpoints.html#unaligned-checkpoints Cheers, Till On Mon, Dec 28, 2020 at 12:29 PM Xintong Song <tonysong...@gmail.com> wrote: > Adding it as a warning of known issues to 1.12.1 release notes makes sense > to me. (If it doesn't get fixed in this release. I'm canceling 1.12.1-rc1 > for another blocker.) > > I'm not entirely sure about adding a warning to 1.12.0 release notes. Is it > how we usually do, adding warnings to release notes for bugs found after > the release? > Putting it another way, should we modify the release notes silently after > they are posted/published? > > @Piotr, > Do we understand in which cases the recovery of unaligned checkpoints can > lead to a corrupted data stream? Or shall we suggest the users to never use > unaligned checkpoints for this version? Maybe you or Roman is the better > person to draft this warning? > > Thank you~ > > Xintong Song > > > > On Mon, Dec 28, 2020 at 6:10 PM Till Rohrmann <trohrm...@apache.org> > wrote: > > > Alright, thanks for the clarification. Should we issue a warning to not > use > > unaligned checkpoints for the time being because it can lead to corrupted > > data streams on recovery? I can envision that some of our users might be > > surprised about it. Maybe adding it to the 1.12.0 and 1.12.1 release > notes? > > > > Cheers, > > Till > > > > On Mon, Dec 28, 2020 at 10:50 AM Piotr Nowojski <pnowoj...@apache.org> > > wrote: > > > > > Hi, > > > > > > Yes, as Xintong wrote above, I've wrote offline to him: > > > > > > > I’m going to remove release blocker status from FLINK-20654. After > all > > we > > > already have released it, at least in 1.12.0, and maybe even sooner. > > There > > > is no point from blocking a release (which has probably some important > > bug > > > fixes) in that case. It’s not a new bug. > > > > > > By "it's not a new bug", I meant that it has already been released in > > > 1.12.0. Also after ignoring the test for the time being, this bug > should > > > not be causing build failures anymore. > > > > > > Piotrek > > > > > > pon., 28 gru 2020 o 10:29 Xintong Song <tonysong...@gmail.com> > > napisał(a): > > > > > > > Hi Till, > > > > > > > > @Piotr and @Roman mentioned offline that FLINK-20648 is not a new bug > > and > > > > they don't think we should block a release on it. > > > > > > > > > > > > I guess we should have made the conversation public visible. Sorry > for > > > the > > > > confusion. > > > > > > > > > > > > Thank you~ > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > On Mon, Dec 28, 2020 at 4:59 PM Till Rohrmann <trohrm...@apache.org> > > > > wrote: > > > > > > > > > Hi Xintong, > > > > > > > > > > quick question, what about FLINK-20654? Previously it was listed > as a > > > > > release blocker but has not been fixed yet. > > > > > > > > > > Cheers, > > > > > Till > > > > > > > > > > On Fri, Dec 25, 2020 at 10:52 AM Xintong Song < > tonysong...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > Hi devs, > > > > > > > > > > > > I'm very glad to announce that all known blocker issues for > > > > > release-1.12.1 > > > > > > have been resolved. > > > > > > > > > > > > I'm creating our first release candidate now and will start a > > > separate > > > > > > voting thread as soon as RC1 is created. > > > > > > > > > > > > Thanks everyone, and Merry Christmas. > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Dec 23, 2020 at 6:07 PM Xintong Song < > > tonysong...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Hi devs, > > > > > > > > > > > > > > Updates on the progress of release. > > > > > > > > > > > > > > In the past week, more than 20 issues were resolved for release > > > > 1.12.1. > > > > > > > Thanks for the efforts. > > > > > > > > > > > > > > We still have 3 unresolved release blockers at the moment. > > > > > > > > > > > > > > - [FLINK-20648] Unable to restore from savepoints with > > > Kubernetes > > > > > HA. > > > > > > > Consensus has been reached on the solution. @Yang Wang is > > > working > > > > > on a > > > > > > > PR. > > > > > > > - [FLINK-20654] Unaligned checkpoint recovery may lead to > > > > corrupted > > > > > > > data stream. > > > > > > > @Roman Khachatryan is still investigating the problem. > > > > > > > - [FLINK-20664] Support setting service account for > > TaskManager > > > > pod. > > > > > > > Boris Lublinsky has opened a PR, which is already reviewed > and > > > > close > > > > > > > to mergeable. > > > > > > > > > > > > > > Since we are targeting a swift release, I'm not intended to > > further > > > > > delay > > > > > > > the release for other non-blocker issues, unless there's a good > > > > reason. > > > > > > > If there's anything that you believe is absolutely necessary > for > > > > > release > > > > > > > 1.12.1, please reach out to me. > > > > > > > Otherwise, the voting process will be started as soon as the > > above > > > > > > > blockers are addressed. > > > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Dec 21, 2020 at 10:05 AM Xingbo Huang < > > hxbks...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > >> Hi Xintong, > > > > > > >> > > > > > > >> Thanks a lot for driving this. > > > > > > >> > > > > > > >> I'd like to bring one more issue to your attention: > > > > > > >> https://issues.apache.org/jira/browse/FLINK-20389. > > > > > > >> This issue occurs quite frequently. Arvid and Kezhu have done > > some > > > > > > >> investigations of this issue and it may indicate a bug of the > > new > > > > > Source > > > > > > >> API. It would be great to figure out the root cause of this > > issue. > > > > > > >> > > > > > > >> Best, > > > > > > >> Xingbo > > > > > > >> > > > > > > >> Xintong Song <tonysong...@gmail.com> 于2020年12月18日周五 下午7:49写道: > > > > > > >> > > > > > > >> > Thanks for the replies so far. > > > > > > >> > > > > > > > >> > I've been reaching out to the owners of the reported issues. > > It > > > > > seems > > > > > > >> most > > > > > > >> > of the blockers are likely resolved in the next few days. > > > > > > >> > > > > > > > >> > Since some of the issues are quite critical, I'd like to aim > > > for a > > > > > > >> *feature > > > > > > >> > freeze on Dec. 23rd*, and start the release voting process > by > > > the > > > > > end > > > > > > of > > > > > > >> > this week. > > > > > > >> > > > > > > > >> > If there's anything you might need more time for, please > reach > > > out > > > > > to > > > > > > >> me. > > > > > > >> > > > > > > > >> > Thank you~ > > > > > > >> > > > > > > > >> > Xintong Song > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > On Fri, Dec 18, 2020 at 3:19 PM Tzu-Li (Gordon) Tai < > > > > > > >> tzuli...@apache.org> > > > > > > >> > wrote: > > > > > > >> > > > > > > > >> > > Thanks Xintong for driving this. > > > > > > >> > > > > > > > > >> > > I'd like to make two more issues related to the Kinesis > > > > connector > > > > > > >> changes > > > > > > >> > > in 1.12.0 a blocker for 1.12.1: > > > > > > >> > > https://issues.apache.org/jira/browse/FLINK-20630 > > > > > > >> > > https://issues.apache.org/jira/browse/FLINK-20629 > > > > > > >> > > > > > > > > >> > > There are already PRs for these issues from @Cranmer, > Danny > > > > > > >> > > <cranm...@amazon.com>, will try to merge these very soon. > > > > > > >> > > > > > > > > >> > > Cheers, > > > > > > >> > > Gordon > > > > > > >> > > > > > > > > >> > > On Fri, Dec 18, 2020 at 1:19 PM Guowei Ma < > > > guowei....@gmail.com > > > > > > > > > > > >> wrote: > > > > > > >> > > > > > > > > >> > >> Thanks for driving this release Xintong. > > > > > > >> > >> I think > https://issues.apache.org/jira/browse/FLINK-20652 > > > > should > > > > > > be > > > > > > >> > >> addressed. > > > > > > >> > >> > > > > > > >> > >> Best, > > > > > > >> > >> Guowei > > > > > > >> > >> > > > > > > >> > >> > > > > > > >> > >> On Fri, Dec 18, 2020 at 11:53 AM Jingsong Li < > > > > > > jingsongl...@gmail.com > > > > > > >> > > > > > > > >> > >> wrote: > > > > > > >> > >> > > > > > > >> > >> > Thanks for volunteering as our release manager Xintong. > > +1 > > > > for > > > > > > >> > releasing > > > > > > >> > >> > Flink 1.12.1 soon. > > > > > > >> > >> > > > > > > > >> > >> > I think > > https://issues.apache.org/jira/browse/FLINK-20665 > > > > > should > > > > > > >> be > > > > > > >> > >> > addressed, I marked it as a Blocker. > > > > > > >> > >> > > > > > > > >> > >> > Best, > > > > > > >> > >> > Jingsong > > > > > > >> > >> > > > > > > > >> > >> > On Fri, Dec 18, 2020 at 11:16 AM Yang Wang < > > > > > > danrtsey...@gmail.com> > > > > > > >> > >> wrote: > > > > > > >> > >> > > > > > > > >> > >> > > Hi David, > > > > > > >> > >> > > > > > > > > >> > >> > > I will take a look this ticket FLINK-20648 and try to > > get > > > > it > > > > > > >> > resolved > > > > > > >> > >> in > > > > > > >> > >> > > this release cycle. > > > > > > >> > >> > > > > > > > > >> > >> > > @Xintong Song <tonysong...@gmail.com> > > > > > > >> > >> > > One more Kubernetes HA related issue. We need to > > support > > > > > > setting > > > > > > >> > >> service > > > > > > >> > >> > > account for TaskManager pod[1]. Even though we have a > > > work > > > > > > around > > > > > > >> > for > > > > > > >> > >> > this > > > > > > >> > >> > > issue, but it is not acceptable to always let the > > default > > > > > > service > > > > > > >> > >> account > > > > > > >> > >> > > with enough permissions. > > > > > > >> > >> > > > > > > > > >> > >> > > [1]. > https://issues.apache.org/jira/browse/FLINK-20664 > > > > > > >> > >> > > > > > > > > >> > >> > > Best, > > > > > > >> > >> > > Yang > > > > > > >> > >> > > > > > > > > >> > >> > > > > > > > > >> > >> > > David Morávek <david.mora...@gmail.com> > 于2020年12月18日周五 > > > > > > >> 上午12:47写道: > > > > > > >> > >> > > > > > > > > >> > >> > > > Hi, I think > > > > > > https://issues.apache.org/jira/browse/FLINK-20648 > > > > > > >> > >> should > > > > > > >> > >> > be > > > > > > >> > >> > > > addressed, as Kubernetes HA was one of the main > > selling > > > > > > points > > > > > > >> of > > > > > > >> > >> this > > > > > > >> > >> > > > release. WDYT? > > > > > > >> > >> > > > > > > > > > >> > >> > > > D. > > > > > > >> > >> > > > > > > > > > >> > >> > > > Sent from my iPhone > > > > > > >> > >> > > > > > > > > > >> > >> > > > > On 17. 12. 2020, at 13:54, Yun Tang < > > > myas...@live.com> > > > > > > >> wrote: > > > > > > >> > >> > > > > > > > > > > >> > >> > > > > Thanks for driving this quick-fix release. > > > > > > >> > >> > > > > +1 for fixing the bug of RocksDB state-backend > with > > > > > reduce > > > > > > >> > >> operators. > > > > > > >> > >> > > > > > > > > > > >> > >> > > > > Best > > > > > > >> > >> > > > > Yun Tang > > > > > > >> > >> > > > > ________________________________ > > > > > > >> > >> > > > > From: Till Rohrmann <trohrm...@apache.org> > > > > > > >> > >> > > > > Sent: Thursday, December 17, 2020 20:51 > > > > > > >> > >> > > > > To: dev <dev@flink.apache.org> > > > > > > >> > >> > > > > Subject: Re: [DISCUSS] Releasing Apache Flink > > 1.12.1 > > > > > > >> > >> > > > > > > > > > > >> > >> > > > > Thanks for volunteering as our release manager > > > Xintong. > > > > > +1 > > > > > > >> for a > > > > > > >> > >> > swift > > > > > > >> > >> > > > bug > > > > > > >> > >> > > > > fix release. > > > > > > >> > >> > > > > > > > > > > >> > >> > > > > Cheers, > > > > > > >> > >> > > > > Till > > > > > > >> > >> > > > > > > > > > > >> > >> > > > >> On Thu, Dec 17, 2020 at 1:20 PM Xintong Song < > > > > > > >> > xts...@apache.org> > > > > > > >> > >> > > wrote: > > > > > > >> > >> > > > >> > > > > > > >> > >> > > > >> Hi devs, > > > > > > >> > >> > > > >> > > > > > > >> > >> > > > >> It's been one week since we announced Apache > Flink > > > > > 1.12.0, > > > > > > >> and > > > > > > >> > >> there > > > > > > >> > >> > > are > > > > > > >> > >> > > > >> already many issues reported, some of which are > > > quite > > > > > > >> critical. > > > > > > >> > >> > Thus, > > > > > > >> > >> > > I > > > > > > >> > >> > > > >> would like to start a discussion on releasing > > Flink > > > > > 1.12.1 > > > > > > >> > soon. > > > > > > >> > >> > > > >> > > > > > > >> > >> > > > >> I would like to volunteer for managing this > > release. > > > > > > >> > >> > > > >> > > > > > > >> > >> > > > >> I've noticed the following issues that need to > be > > > > > included > > > > > > >> in > > > > > > >> > the > > > > > > >> > >> > new > > > > > > >> > >> > > > >> bugfix release. > > > > > > >> > >> > > > >> > > > > > > >> > >> > > > >> - The entrypoint script for the official > docker > > > > image > > > > > > does > > > > > > >> > not > > > > > > >> > >> > meet > > > > > > >> > >> > > > the > > > > > > >> > >> > > > >> standards of docker-library/official-images > > repo. > > > > [1] > > > > > > >> > >> > > > >> - Streaming jobs with window-less reduce > > operation > > > > do > > > > > > now > > > > > > >> > work > > > > > > >> > >> > with > > > > > > >> > >> > > > >> RocksDB state backend. [2] > > > > > > >> > >> > > > >> - @Stephan mentioned some Kafka fixes ([3] and > > > maybe > > > > > > more) > > > > > > >> > >> that he > > > > > > >> > >> > > > would > > > > > > >> > >> > > > >> try to make into this release. > > > > > > >> > >> > > > >> - @Kurt mentioned a batch workload instability > > > > related > > > > > > to > > > > > > >> > >> managed > > > > > > >> > >> > > > memory > > > > > > >> > >> > > > >> being released slowly, which his team is > > currently > > > > > > >> > >> investigating > > > > > > >> > >> > and > > > > > > >> > >> > > > >> would > > > > > > >> > >> > > > >> try to fix in this release. > > > > > > >> > >> > > > >> > > > > > > >> > >> > > > >> Apart from the issues above, please let us know > in > > > > this > > > > > > >> thread > > > > > > >> > if > > > > > > >> > >> > > there > > > > > > >> > >> > > > are > > > > > > >> > >> > > > >> any other fixes that we should try to include. > > I'll > > > > try > > > > > to > > > > > > >> > >> > communicate > > > > > > >> > >> > > > with > > > > > > >> > >> > > > >> the issue owners and come up with a time > > estimation > > > > > early > > > > > > >> next > > > > > > >> > >> week. > > > > > > >> > >> > > > >> > > > > > > >> > >> > > > >> Thanks, > > > > > > >> > >> > > > >> Xintong > > > > > > >> > >> > > > >> > > > > > > >> > >> > > > >> [1] > > > https://issues.apache.org/jira/browse/FLINK-20650 > > > > > > >> > >> > > > >> [2] > > > https://issues.apache.org/jira/browse/FLINK-20646 > > > > > > >> > >> > > > >> [3] > > > https://issues.apache.org/jira/browse/FLINK-20379 > > > > > > >> > >> > > > >> > > > > > > >> > >> > > > > > > > > > >> > >> > > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > >> > -- > > > > > > >> > >> > Best, Jingsong Lee > > > > > > >> > >> > > > > > > > >> > >> > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > >