@Yu discovered this issue, which IMO is probably a blocker for the release: https://issues.apache.org/jira/browse/FLINK-12064. The bug is a regression caused by previous state backend refactorings, which can result in incorrect representation of the schema of serialized keys in state because the wrong key serializer instance is being snapshotted.
There is already a PR to fix this, I'll try to review and merge it over the weekend. On Fri, Mar 29, 2019 at 7:13 AM Richard Deurwaarder <rich...@xeli.eu> wrote: > -1 (non-binding) > > - Ran integration tests locally (1000+) of our flink job, all succeeded. > - Attempted to run job on hadoop, failed. It failed because we have a > firewall in place and we cannot set the rest port to a specific port/port > range. > Unless I am mistaken, it seems like FLINK-11081 broke the possibility of > setting a REST port when running on yarn ( > > https://github.com/apache/flink/commit/730eed71ef3f718d61f85d5e94b1060844ca56db#diff-487838863ab693af7008f04cb3359be3R102 > ) > Code-wise it seems rather straightforward to fix but I am unsure about the > reason why this is hard-coded to 0 and what the impact would be. > > It would benefit us greatly if a fix for this could make it to 1.8.0. > > Regards, > > Richard > > On Thu, Mar 28, 2019 at 9:54 AM Tzu-Li (Gordon) Tai <tzuli...@apache.org> > wrote: > > > +1 (binding) > > > > Functional checks: > > > > - Built Flink from source (`mvn clean verify`) locally, with success > > - Ran end-to-end tests locally for 5 times in a loop, no attempts failed > > (Hadoop 2.8.4, Scala 2.12) > > - Manually tested state schema evolution for POJO. Besides the tests that > > @Congxian already did, additionally tested evolution cases with POJO > > subclasses + non-registered POJOs. > > - Manually tested migration of Scala stateful jobs that use case classes > / > > Scala collections as state types, performing the migration across Scala > > 2.11 to Scala 2.12. > > - Reviewed release announcement PR > > > > Misc / legal checks: > > > > - checked checksums and signatures > > - No binaries in source distribution > > - Staging area does not seem to have any missing artifacts > > > > Cheers, > > Gordon > > > > On Thu, Mar 28, 2019 at 4:52 PM Tzu-Li (Gordon) Tai <tzuli...@apache.org > > > > wrote: > > > > > @Shaoxuan > > > > > > The drop in the serializerAvro benchmark, as explained earlier in > > previous > > > voting threads of earlier RCs, was due to a slower job initialization > > phase > > > caused by slower deserialization of the AvroSerializer. > > > Piotr also pointed out that after the number of records was increased > in > > > the serializer benchmarks, this drop was no longer observable before / > > > after the changes in mid February. > > > IMO, this is not critical as it does not affect the per-record > > performance > > > / throughput, and therefore should not block this release. > > > > > > On Thu, Mar 28, 2019 at 1:08 AM Aljoscha Krettek < > aljos...@fastmail.com> > > > wrote: > > > > > >> By now, I'm reasonably sure that the test instabilities on the > > end-to-end > > >> test are only instabilities. I pushed changes to increase timeouts to > > make > > >> the tests more stable. As in any project, there will always be bugs > but > > I > > >> think we could release this RC4 and be reasonably sure that it works > > well. > > >> > > >> Now, we only need to have the required number of PMC votes. > > >> > > >> On Wed, Mar 27, 2019, at 07:22, Congxian Qiu wrote: > > >> > +1 (non-binding) > > >> > > > >> > • checked signature and checksum ok > > >> > • mvn clean package -DskipTests ok > > >> > • Run job on yarn ok > > >> > • Test state migration with POJO type (both heap and rocksdb) ok > > >> > • - 1.6 -> 1.8 > > >> > • - 1.7 -> 1.8 > > >> > • - 1.8 -> 1.8 > > >> > > > >> > > > >> > Best, Congxian > > >> > On Mar 27, 2019, 10:26 +0800, vino yang <yanghua1...@gmail.com>, > > wrote: > > >> > > +1 (non-binding) > > >> > > > > >> > > - checked JIRA release note > > >> > > - ran "mvn package -DskipTests" > > >> > > - checked signature and checksum > > >> > > - started a cluster locally and ran some examples in binary > > >> > > - checked web site announcement's PR > > >> > > > > >> > > Best, > > >> > > Vino > > >> > > > > >> > > > > >> > > Xiaowei Jiang <xiaow...@gmail.com> 于2019年3月26日周二 下午8:20写道: > > >> > > > > >> > > > +1 (non-binding) > > >> > > > > > >> > > > - checked checksums and GPG files > > >> > > > - build from source successfully- run end-to-end precommit tests > > >> > > > successfully- run end-to-end nightly tests successfully > > >> > > > Xiaowei > > >> > > > On Tuesday, March 26, 2019, 8:09:19 PM GMT+8, Yu Li < > > >> car...@gmail.com> > > >> > > > wrote: > > >> > > > > > >> > > > +1 (non-binding) > > >> > > > > > >> > > > - Checked release notes: OK > > >> > > > - Checked sums and signatures: OK > > >> > > > - Source release > > >> > > > - contains no binaries: OK > > >> > > > - contains no 1.8-SNAPSHOT references: OK > > >> > > > - build from source: OK (8u101) > > >> > > > - mvn clean verify: OK (8u101) > > >> > > > - Binary release > > >> > > > - no examples appear to be missing > > >> > > > - started a cluster; WebUI reachable, example ran successfully > > >> > > > - end-to-end test (all but K8S and docker ones): OK (8u101) > > >> > > > - Repository appears to contain all expected artifacts > > >> > > > > > >> > > > Best Regards, > > >> > > > Yu > > >> > > > > > >> > > > > > >> > > > On Tue, 26 Mar 2019 at 14:28, Kurt Young <ykt...@gmail.com> > > wrote: > > >> > > > > > >> > > > > +1 (non-binding) > > >> > > > > > > >> > > > > Checked items: > > >> > > > > - checked checksums and GPG files > > >> > > > > - verified that the source archives do not contains any > binaries > > >> > > > > - checked that all POM files point to the same version > > >> > > > > - build from source successfully > > >> > > > > > > >> > > > > Best, > > >> > > > > Kurt > > >> > > > > > > >> > > > > > > >> > > > > On Tue, Mar 26, 2019 at 10:57 AM Shaoxuan Wang < > > >> wshaox...@gmail.com> > > >> > > > > wrote: > > >> > > > > > > >> > > > > > +1 (non-binding) > > >> > > > > > > > >> > > > > > I tested RC4 with the following items: > > >> > > > > > - Maven Central Repository contains all artifacts > > >> > > > > > - Built the source with Maven (ensured all source files have > > >> Apache > > >> > > > > > headers), and executed built-in tests via "mvn clean verify" > > >> > > > > > - Manually executed the tests in IntelliJ IDE > > >> > > > > > - Verify that the quickstarts for Scala and Java are working > > >> with the > > >> > > > > > staging repository in IntelliJ > > >> > > > > > - Checked the benchmark results. The perf regression of > > >> > > > > > tuple-key-by/statebackend/tumblingWindow are gone, but the > > >> regression > > >> > > > on > > >> > > > > > serializer still exists. > > >> > > > > > > > >> > > > > > Regards, > > >> > > > > > Shaoxuan > > >> > > > > > > > >> > > > > > On Tue, Mar 26, 2019 at 8:06 AM jincheng sun < > > >> sunjincheng...@gmail.com > > >> > > > > > > >> > > > > > wrote: > > >> > > > > > > > >> > > > > > > Hi Aljoscha, I think you are right, increase the timeout > > >> config will > > >> > > > > fix > > >> > > > > > > this issue. this depends on the resource of Travis. I > would > > >> like > > >> > > > share > > >> > > > > > > some phenomenon during my test (not the flink problem) as > > >> follows: > > >> > > > :-) > > >> > > > > > > > > >> > > > > > > During my testing, `mvn clean verify` and `nightly > > end-to-end > > >> test ` > > >> > > > > both > > >> > > > > > > consume a lot of machine resources (especially > > >> memory/network), and > > >> > > > the > > >> > > > > > > network bandwidth requirements of `nightly end-to-end > test ` > > >> are also > > >> > > > > > very > > >> > > > > > > high. In China, need to use VPN acceleration (100~200Kb > > before > > >> > > > > > > acceleration, 3~4Mb after acceleration), I have > encountered: > > >> [Avro > > >> > > > > > > Confluent Schema Registry nightly end-to-end test' failed > > >> after 18 > > >> > > > > > minutes > > >> > > > > > > and 15 seconds! Test exited with exit Code 1] takes more > > than > > >> 18 > > >> > > > > minutes, > > >> > > > > > > the download failed because the network bandwidth is not > > >> enough. and > > >> > > > it > > >> > > > > > > runs smoothly when using VPN acceleration. The overall > > >> end-to-end run > > >> > > > > was > > >> > > > > > > passed twice. The Docker resource configuration (CUPs 7, > > Mem: > > >> 28.7G, > > >> > > > > > Swap: > > >> > > > > > > 3.5G). See detail log here > > >> > > > > > > < > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > https://docs.google.com/document/d/1CcyTCyZmMmP57pkKv4drjSuxW61_u78HR3q1fJJODMw/edit?usp=sharing > > >> > > > > > > > > > >> > > > > > > . > > >> > > > > > > > > >> > > > > > > Just now, I had checked the Travis for your last commit > > >> (Increase > > >> > > > > startup > > >> > > > > > > timeout in end-to-end tests), in addition to the Cleanup > > >> phase, other > > >> > > > > > > phases are successful. here > > >> > > > > > > <https://travis-ci.org/apache/flink/builds/511071777> > > >> > > > > > > > > >> > > > > > > In order to verify that our speculation is accurate, I can > > >> help with > > >> > > > 10 > > >> > > > > > and > > >> > > > > > > 20 seconds timeout config on my repo verification to see > if > > >> 100% > > >> > > > > > recurring > > >> > > > > > > timeout problem. It is already running, we are waiting for > > the > > >> > > > result. > > >> > > > > > > 10seconds < > > >> > > > https://travis-ci.org/sunjincheng121/flink/builds/511235749 > > >> > > > > > > > >> > > > > > > 20seconds < > > >> > > > https://travis-ci.org/sunjincheng121/flink/builds/511235598 > > >> > > > > > > > >> > > > > > > > > >> > > > > > > Best, > > >> > > > > > > Jincheng > > >> > > > > > > > > >> > > > > > > Aljoscha Krettek <aljos...@apache.org> 于2019年3月26日周二 > > >> 上午1:04写道: > > >> > > > > > > > > >> > > > > > > > Thanks for the testing done so far! > > >> > > > > > > > > > >> > > > > > > > There has been quite some flakiness on Travis lately, > see > > >> here: > > >> > > > > > > > https://travis-ci.org/apache/flink/branches < > > >> > > > > > > > https://travis-ci.org/apache/flink/branches>. I’m a bit > > >> hesitant > > >> > > > to > > >> > > > > > > > release in this state. Looking at the tests you can see > > >> that all of > > >> > > > > the > > >> > > > > > > > end-to-end tests fail because waiting for the dispatcher > > to > > >> come up > > >> > > > > > times > > >> > > > > > > > out. I also noticed that this usually takes about 5-8 > > >> seconds on > > >> > > > > > Travis, > > >> > > > > > > so > > >> > > > > > > > a 10 second timeout might be a bit low. I pushed commits > > to > > >> > > > increase > > >> > > > > > that > > >> > > > > > > > to 20 secs. Let’s see what will happen. > > >> > > > > > > > > > >> > > > > > > > I’ll keep you posted! > > >> > > > > > > > Aljoscha > > >> > > > > > > > > > >> > > > > > > > > On 25. Mar 2019, at 13:13, jincheng sun < > > >> > > > sunjincheng...@gmail.com> > > >> > > > > > > > wrote: > > >> > > > > > > > > > > >> > > > > > > > > Great thanks for preparing the RC4 of Flink 1.8.0, > > >> Aljoscha! > > >> > > > > > > > > > > >> > > > > > > > > +1 (non-binding) > > >> > > > > > > > > > > >> > > > > > > > > I checked the functional things as follows(Without > > >> performance > > >> > > > > > > > > verification): > > >> > > > > > > > > > > >> > > > > > > > > 1. Checking Artifacts: > > >> > > > > > > > > > > >> > > > > > > > > 1). Download the release source code - SUCCESS > > >> > > > > > > > > 2). Check Source release flink-1.8.0-src.tgz.sha512 - > > >> SUCCESS > > >> > > > > > > > > 3). Download the released JAR - SUCCESS > > >> > > > > > > > > 4). Check if checksums and GPG files match the > > >> corresponding > > >> > > > > > release > > >> > > > > > > > > files - SUCCESS. > > >> > > > > > > > > 5). Verify that the source archives do not contain any > > >> > > > binaries > > >> > > > > - > > >> > > > > > > > > SUCCESS. > > >> > > > > > > > > 6). Build the source with `mvn clean verify > -DskipTests` > > >> to > > >> > > > > ensure > > >> > > > > > > all > > >> > > > > > > > > source files have Apache headers - SUCCESS > > >> > > > > > > > > 7). Check that all POM files point to the same > version - > > >> > > > SUCCESS > > >> > > > > > > > > 8). Read the `README.md` file to ensure there is > nothing > > >> > > > > > unexpected > > >> > > > > > > - > > >> > > > > > > > > SUCCESS > > >> > > > > > > > > > > >> > > > > > > > > 2. Testing Larger Setups > > >> > > > > > > > > > > >> > > > > > > > > Cluster Environment:7 nodes, jm 1024m, tm 4096m > > >> > > > > > > > > Testing Jobs: WordCount(Batch&Streaming), > > >> > > > > > > DataStreamAllroundTestProgram > > >> > > > > > > > > > > >> > > > > > > > > 1). Use local&hdfs file systems for checkpoints - > > SUCCESS > > >> > > > > > > > > 2). Use hdfs file systems for input/output -SUCCESS > > >> > > > > > > > > 3). Run examples on YARN(with or without session) - > > >> SUCCESS > > >> > > > > > > > > 4). Test failover and recovery. - SUCCESS > > >> > > > > > > > > 5). Test incremental&non-incremental checkpoint - > > SUCCESS > > >> > > > > > > > > 6). Test connector - kafka -SUCCESS > > >> > > > > > > > > > > >> > > > > > > > > 3. Testing Functionality > > >> > > > > > > > > > > >> > > > > > > > > 1). Built-in tests(linux&mac os) > > >> > > > > > > > > - `mvn cealn verify` (some test timeout error and test > > >> case > > >> > > > > bug > > >> > > > > > > see > > >> > > > > > > > > FLINK-12001 < > > >> https://issues.apache.org/jira/browse/FLINK-12001>, > > >> > > > > all > > >> > > > > > > of > > >> > > > > > > > > them are not the blocker) > > >> > > > > > > > > - build for scala 2.11(mvn clean install -P scala-2.11 > > >> > > > > > > -DskipTests) > > >> > > > > > > > > - SUCCESS > > >> > > > > > > > > - Run the scripted nightly end-to-end test - SUCCESS > > >> > > > > > > > > > > >> > > > > > > > > 2). Quickstarts > > >> > > > > > > > > - Verify that the quickstarts for Scala with the > staging > > >> > > > > > > repository > > >> > > > > > > > > in IntelliJ - SUCCESS > > >> > > > > > > > > - Verify that the quickstarts for Java with the > staging > > >> > > > > > repository > > >> > > > > > > > in > > >> > > > > > > > > IntelliJ - SUCCESS > > >> > > > > > > > > > > >> > > > > > > > > 3). Simple Starter Experience and Use Cases > > >> > > > > > > > > > > >> > > > > > > > > - run all examples from IntelliJ IDE - SUCCESS > > >> > > > > > > > > - Start a local cluster and verify that the processes > - > > >> > > > > SUCCESS > > >> > > > > > > > > a. Examine the *.out files (should be empty) and the > log > > >> > > > > files > > >> > > > > > > > > (should contain no exceptions) > > >> > > > > > > > > b. Test for Linux, MacOS > > >> > > > > > > > > c. Shutdown and verify there are no exceptions in the > > log > > >> > > > > > output > > >> > > > > > > > > (after shutdown) > > >> > > > > > > > > > > >> > > > > > > > > - Verify that the examples are running from both > > >> ./bin/flink > > >> > > > > and > > >> > > > > > > > from > > >> > > > > > > > > the web-based job submission tool(following items) - > > >> SUCCESS > > >> > > > > > > > > a. Start multiple task managers in the local cluster > > >> > > > > > > > > b. Change the flink-conf.yml to define more than one > > task > > >> > > > > slot > > >> > > > > > > (2) > > >> > > > > > > > > c. Run the examples with a parallelism > 1 > > >> > > > > > > > > d. Examine the log output - no error messages should > be > > >> > > > > > > > encountered > > >> > > > > > > > > > > >> > > > > > > > > 4. Review the PR > > >> > > > > > > > > - [Add 1.8 Release Blog Post] - Just a reminder, > updated > > >> the > > >> > > > > > > release > > >> > > > > > > > > date to correct date before merging. > > >> > > > > > > > > > > >> > > > > > > > > Cheers, > > >> > > > > > > > > Jincheng > > >> > > > > > > > > > > >> > > > > > > > > Piotr Nowojski <pi...@ververica.com> 于2019年3月25日周一 > > >> 下午4:11写道: > > >> > > > > > > > > > > >> > > > > > > > > > +1 from my side. Previously spotted performance > > >> regression seems > > >> > > > > to > > >> > > > > > be > > >> > > > > > > > > > gone, or mostly gone. > > >> > > > > > > > > > > > >> > > > > > > > > > Piotrek > > >> > > > > > > > > > > > >> > > > > > > > > > > On 21 Mar 2019, at 17:52, Aljoscha Krettek < > > >> > > > aljos...@apache.org> > > >> > > > > > > > wrote: > > >> > > > > > > > > > > > > >> > > > > > > > > > > Hi everyone, > > >> > > > > > > > > > > Please review and vote on the release candidate 4 > > for > > >> Flink > > >> > > > > 1.8.0, > > >> > > > > > as > > >> > > > > > > > > > follows: > > >> > > > > > > > > > > [ ] +1, Approve the release > > >> > > > > > > > > > > [ ] -1, Do not approve the release (please provide > > >> specific > > >> > > > > > comments) > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > The complete staging area is available for your > > >> review, which > > >> > > > > > > includes: > > >> > > > > > > > > > > * JIRA release notes [1], > > >> > > > > > > > > > > * the official Apache source release and binary > > >> convenience > > >> > > > > > releases > > >> > > > > > > to > > >> > > > > > > > > > be deployed to dist.apache.org [2], which are > signed > > >> with the > > >> > > > key > > >> > > > > > > with > > >> > > > > > > > > > fingerprint F2A67A8047499BBB3908D17AA8F4FD97121D7293 > > >> [3], > > >> > > > > > > > > > > * all artifacts to be deployed to the Maven > Central > > >> Repository > > >> > > > > [4], > > >> > > > > > > > > > > * source code tag "release-1.8.0-rc4" [5], > > >> > > > > > > > > > > * website pull request listing the new release [6] > > >> > > > > > > > > > > * website pull request adding announcement blog > post > > >> [7]. > > >> > > > > > > > > > > > > >> > > > > > > > > > > The vote will be open for at least 72 hours. It is > > >> adopted by > > >> > > > > > > majority > > >> > > > > > > > > > approval, with at least 3 PMC affirmative votes. > > >> > > > > > > > > > > > > >> > > > > > > > > > > Thanks, > > >> > > > > > > > > > > Aljoscha > > >> > > > > > > > > > > > > >> > > > > > > > > > > [1] > > >> > > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344274 > > >> > > > > > > > > > > [2] > > >> > > > > https://dist.apache.org/repos/dist/dev/flink/flink-1.8.0-rc4/ > > >> > > > > > > > > > > [3] > > >> https://dist.apache.org/repos/dist/release/flink/KEYS > > >> > > > > > > > > > > [4] > > >> > > > > > > > > > > > >> > > > > > > > > >> > > > > > >> > https://repository.apache.org/content/repositories/orgapacheflink-1215 > > >> > > > > > > > > > > [5] > > >> > > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > https://gitbox.apache.org/repos/asf?p=flink.git;a=tag;h=c650befc10c8bb6cc4b007ae250b7b2173046145 > > >> > > > > > > > > > > [6] https://github.com/apache/flink-web/pull/180 > < > > >> > > > > > > > > > https://github.com/apache/flink-web/pull/180> > > >> > > > > > > > > > > [7] https://github.com/apache/flink-web/pull/179 > < > > >> > > > > > > > > > https://github.com/apache/flink-web/pull/179> > > >> > > > > > > > > > > > > >> > > > > > > > > > > P.S. The difference to the previous RCs is small, > > you > > >> can fetch > > >> > > > > the > > >> > > > > > > > tags > > >> > > > > > > > > > and do a "git log > > release-1.8.0-rc1..release-1.8.0-rc4” > > >> to see > > >> > > > the > > >> > > > > > > > > > difference in commits. Its fixes for the issues that > > >> led to the > > >> > > > > > > > > > cancellation of the previous RCs plus smaller fixes. > > >> Most > > >> > > > > > > > > > verification/testing that was carried out should > apply > > >> as is to > > >> > > > > this > > >> > > > > > > RC. > > >> > > > > > > > > > Any functional verification that you did on previous > > >> RCs should > > >> > > > > > > > therefore > > >> > > > > > > > > > easily carry over to this one. > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > >> > > > > > >