Thanks everyone, for discovering these! @Richard Could you open a Jira issue for this and mark it as blocking?
> On 29. Mar 2019, at 09:22, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: > > @Yu discovered this issue, which IMO is probably a blocker for the > release: https://issues.apache.org/jira/browse/FLINK-12064. > The bug is a regression caused by previous state backend refactorings, > which can result in incorrect representation of the schema of serialized > keys in state because the wrong key serializer instance is being > snapshotted. > > There is already a PR to fix this, I'll try to review and merge it over the > weekend. > > On Fri, Mar 29, 2019 at 7:13 AM Richard Deurwaarder <rich...@xeli.eu> wrote: > >> -1 (non-binding) >> >> - Ran integration tests locally (1000+) of our flink job, all succeeded. >> - Attempted to run job on hadoop, failed. It failed because we have a >> firewall in place and we cannot set the rest port to a specific port/port >> range. >> Unless I am mistaken, it seems like FLINK-11081 broke the possibility of >> setting a REST port when running on yarn ( >> >> https://github.com/apache/flink/commit/730eed71ef3f718d61f85d5e94b1060844ca56db#diff-487838863ab693af7008f04cb3359be3R102 >> ) >> Code-wise it seems rather straightforward to fix but I am unsure about the >> reason why this is hard-coded to 0 and what the impact would be. >> >> It would benefit us greatly if a fix for this could make it to 1.8.0. >> >> Regards, >> >> Richard >> >> On Thu, Mar 28, 2019 at 9:54 AM Tzu-Li (Gordon) Tai <tzuli...@apache.org> >> wrote: >> >>> +1 (binding) >>> >>> Functional checks: >>> >>> - Built Flink from source (`mvn clean verify`) locally, with success >>> - Ran end-to-end tests locally for 5 times in a loop, no attempts failed >>> (Hadoop 2.8.4, Scala 2.12) >>> - Manually tested state schema evolution for POJO. Besides the tests that >>> @Congxian already did, additionally tested evolution cases with POJO >>> subclasses + non-registered POJOs. >>> - Manually tested migration of Scala stateful jobs that use case classes >> / >>> Scala collections as state types, performing the migration across Scala >>> 2.11 to Scala 2.12. >>> - Reviewed release announcement PR >>> >>> Misc / legal checks: >>> >>> - checked checksums and signatures >>> - No binaries in source distribution >>> - Staging area does not seem to have any missing artifacts >>> >>> Cheers, >>> Gordon >>> >>> On Thu, Mar 28, 2019 at 4:52 PM Tzu-Li (Gordon) Tai <tzuli...@apache.org >>> >>> wrote: >>> >>>> @Shaoxuan >>>> >>>> The drop in the serializerAvro benchmark, as explained earlier in >>> previous >>>> voting threads of earlier RCs, was due to a slower job initialization >>> phase >>>> caused by slower deserialization of the AvroSerializer. >>>> Piotr also pointed out that after the number of records was increased >> in >>>> the serializer benchmarks, this drop was no longer observable before / >>>> after the changes in mid February. >>>> IMO, this is not critical as it does not affect the per-record >>> performance >>>> / throughput, and therefore should not block this release. >>>> >>>> On Thu, Mar 28, 2019 at 1:08 AM Aljoscha Krettek < >> aljos...@fastmail.com> >>>> wrote: >>>> >>>>> By now, I'm reasonably sure that the test instabilities on the >>> end-to-end >>>>> test are only instabilities. I pushed changes to increase timeouts to >>> make >>>>> the tests more stable. As in any project, there will always be bugs >> but >>> I >>>>> think we could release this RC4 and be reasonably sure that it works >>> well. >>>>> >>>>> Now, we only need to have the required number of PMC votes. >>>>> >>>>> On Wed, Mar 27, 2019, at 07:22, Congxian Qiu wrote: >>>>>> +1 (non-binding) >>>>>> >>>>>> • checked signature and checksum ok >>>>>> • mvn clean package -DskipTests ok >>>>>> • Run job on yarn ok >>>>>> • Test state migration with POJO type (both heap and rocksdb) ok >>>>>> • - 1.6 -> 1.8 >>>>>> • - 1.7 -> 1.8 >>>>>> • - 1.8 -> 1.8 >>>>>> >>>>>> >>>>>> Best, Congxian >>>>>> On Mar 27, 2019, 10:26 +0800, vino yang <yanghua1...@gmail.com>, >>> wrote: >>>>>>> +1 (non-binding) >>>>>>> >>>>>>> - checked JIRA release note >>>>>>> - ran "mvn package -DskipTests" >>>>>>> - checked signature and checksum >>>>>>> - started a cluster locally and ran some examples in binary >>>>>>> - checked web site announcement's PR >>>>>>> >>>>>>> Best, >>>>>>> Vino >>>>>>> >>>>>>> >>>>>>> Xiaowei Jiang <xiaow...@gmail.com> 于2019年3月26日周二 下午8:20写道: >>>>>>> >>>>>>>> +1 (non-binding) >>>>>>>> >>>>>>>> - checked checksums and GPG files >>>>>>>> - build from source successfully- run end-to-end precommit tests >>>>>>>> successfully- run end-to-end nightly tests successfully >>>>>>>> Xiaowei >>>>>>>> On Tuesday, March 26, 2019, 8:09:19 PM GMT+8, Yu Li < >>>>> car...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> +1 (non-binding) >>>>>>>> >>>>>>>> - Checked release notes: OK >>>>>>>> - Checked sums and signatures: OK >>>>>>>> - Source release >>>>>>>> - contains no binaries: OK >>>>>>>> - contains no 1.8-SNAPSHOT references: OK >>>>>>>> - build from source: OK (8u101) >>>>>>>> - mvn clean verify: OK (8u101) >>>>>>>> - Binary release >>>>>>>> - no examples appear to be missing >>>>>>>> - started a cluster; WebUI reachable, example ran successfully >>>>>>>> - end-to-end test (all but K8S and docker ones): OK (8u101) >>>>>>>> - Repository appears to contain all expected artifacts >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> Yu >>>>>>>> >>>>>>>> >>>>>>>> On Tue, 26 Mar 2019 at 14:28, Kurt Young <ykt...@gmail.com> >>> wrote: >>>>>>>> >>>>>>>>> +1 (non-binding) >>>>>>>>> >>>>>>>>> Checked items: >>>>>>>>> - checked checksums and GPG files >>>>>>>>> - verified that the source archives do not contains any >> binaries >>>>>>>>> - checked that all POM files point to the same version >>>>>>>>> - build from source successfully >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Kurt >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Mar 26, 2019 at 10:57 AM Shaoxuan Wang < >>>>> wshaox...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> +1 (non-binding) >>>>>>>>>> >>>>>>>>>> I tested RC4 with the following items: >>>>>>>>>> - Maven Central Repository contains all artifacts >>>>>>>>>> - Built the source with Maven (ensured all source files have >>>>> Apache >>>>>>>>>> headers), and executed built-in tests via "mvn clean verify" >>>>>>>>>> - Manually executed the tests in IntelliJ IDE >>>>>>>>>> - Verify that the quickstarts for Scala and Java are working >>>>> with the >>>>>>>>>> staging repository in IntelliJ >>>>>>>>>> - Checked the benchmark results. The perf regression of >>>>>>>>>> tuple-key-by/statebackend/tumblingWindow are gone, but the >>>>> regression >>>>>>>> on >>>>>>>>>> serializer still exists. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Shaoxuan >>>>>>>>>> >>>>>>>>>> On Tue, Mar 26, 2019 at 8:06 AM jincheng sun < >>>>> sunjincheng...@gmail.com >>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Aljoscha, I think you are right, increase the timeout >>>>> config will >>>>>>>>> fix >>>>>>>>>>> this issue. this depends on the resource of Travis. I >> would >>>>> like >>>>>>>> share >>>>>>>>>>> some phenomenon during my test (not the flink problem) as >>>>> follows: >>>>>>>> :-) >>>>>>>>>>> >>>>>>>>>>> During my testing, `mvn clean verify` and `nightly >>> end-to-end >>>>> test ` >>>>>>>>> both >>>>>>>>>>> consume a lot of machine resources (especially >>>>> memory/network), and >>>>>>>> the >>>>>>>>>>> network bandwidth requirements of `nightly end-to-end >> test ` >>>>> are also >>>>>>>>>> very >>>>>>>>>>> high. In China, need to use VPN acceleration (100~200Kb >>> before >>>>>>>>>>> acceleration, 3~4Mb after acceleration), I have >> encountered: >>>>> [Avro >>>>>>>>>>> Confluent Schema Registry nightly end-to-end test' failed >>>>> after 18 >>>>>>>>>> minutes >>>>>>>>>>> and 15 seconds! Test exited with exit Code 1] takes more >>> than >>>>> 18 >>>>>>>>> minutes, >>>>>>>>>>> the download failed because the network bandwidth is not >>>>> enough. and >>>>>>>> it >>>>>>>>>>> runs smoothly when using VPN acceleration. The overall >>>>> end-to-end run >>>>>>>>> was >>>>>>>>>>> passed twice. The Docker resource configuration (CUPs 7, >>> Mem: >>>>> 28.7G, >>>>>>>>>> Swap: >>>>>>>>>>> 3.5G). See detail log here >>>>>>>>>>> < >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>> >>> >> https://docs.google.com/document/d/1CcyTCyZmMmP57pkKv4drjSuxW61_u78HR3q1fJJODMw/edit?usp=sharing >>>>>>>>>>>> >>>>>>>>>>> . >>>>>>>>>>> >>>>>>>>>>> Just now, I had checked the Travis for your last commit >>>>> (Increase >>>>>>>>> startup >>>>>>>>>>> timeout in end-to-end tests), in addition to the Cleanup >>>>> phase, other >>>>>>>>>>> phases are successful. here >>>>>>>>>>> <https://travis-ci.org/apache/flink/builds/511071777> >>>>>>>>>>> >>>>>>>>>>> In order to verify that our speculation is accurate, I can >>>>> help with >>>>>>>> 10 >>>>>>>>>> and >>>>>>>>>>> 20 seconds timeout config on my repo verification to see >> if >>>>> 100% >>>>>>>>>> recurring >>>>>>>>>>> timeout problem. It is already running, we are waiting for >>> the >>>>>>>> result. >>>>>>>>>>> 10seconds < >>>>>>>> https://travis-ci.org/sunjincheng121/flink/builds/511235749 >>>>>>>>>> >>>>>>>>>>> 20seconds < >>>>>>>> https://travis-ci.org/sunjincheng121/flink/builds/511235598 >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Jincheng >>>>>>>>>>> >>>>>>>>>>> Aljoscha Krettek <aljos...@apache.org> 于2019年3月26日周二 >>>>> 上午1:04写道: >>>>>>>>>>> >>>>>>>>>>>> Thanks for the testing done so far! >>>>>>>>>>>> >>>>>>>>>>>> There has been quite some flakiness on Travis lately, >> see >>>>> here: >>>>>>>>>>>> https://travis-ci.org/apache/flink/branches < >>>>>>>>>>>> https://travis-ci.org/apache/flink/branches>. I’m a bit >>>>> hesitant >>>>>>>> to >>>>>>>>>>>> release in this state. Looking at the tests you can see >>>>> that all of >>>>>>>>> the >>>>>>>>>>>> end-to-end tests fail because waiting for the dispatcher >>> to >>>>> come up >>>>>>>>>> times >>>>>>>>>>>> out. I also noticed that this usually takes about 5-8 >>>>> seconds on >>>>>>>>>> Travis, >>>>>>>>>>> so >>>>>>>>>>>> a 10 second timeout might be a bit low. I pushed commits >>> to >>>>>>>> increase >>>>>>>>>> that >>>>>>>>>>>> to 20 secs. Let’s see what will happen. >>>>>>>>>>>> >>>>>>>>>>>> I’ll keep you posted! >>>>>>>>>>>> Aljoscha >>>>>>>>>>>> >>>>>>>>>>>>> On 25. Mar 2019, at 13:13, jincheng sun < >>>>>>>> sunjincheng...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Great thanks for preparing the RC4 of Flink 1.8.0, >>>>> Aljoscha! >>>>>>>>>>>>> >>>>>>>>>>>>> +1 (non-binding) >>>>>>>>>>>>> >>>>>>>>>>>>> I checked the functional things as follows(Without >>>>> performance >>>>>>>>>>>>> verification): >>>>>>>>>>>>> >>>>>>>>>>>>> 1. Checking Artifacts: >>>>>>>>>>>>> >>>>>>>>>>>>> 1). Download the release source code - SUCCESS >>>>>>>>>>>>> 2). Check Source release flink-1.8.0-src.tgz.sha512 - >>>>> SUCCESS >>>>>>>>>>>>> 3). Download the released JAR - SUCCESS >>>>>>>>>>>>> 4). Check if checksums and GPG files match the >>>>> corresponding >>>>>>>>>> release >>>>>>>>>>>>> files - SUCCESS. >>>>>>>>>>>>> 5). Verify that the source archives do not contain any >>>>>>>> binaries >>>>>>>>> - >>>>>>>>>>>>> SUCCESS. >>>>>>>>>>>>> 6). Build the source with `mvn clean verify >> -DskipTests` >>>>> to >>>>>>>>> ensure >>>>>>>>>>> all >>>>>>>>>>>>> source files have Apache headers - SUCCESS >>>>>>>>>>>>> 7). Check that all POM files point to the same >> version - >>>>>>>> SUCCESS >>>>>>>>>>>>> 8). Read the `README.md` file to ensure there is >> nothing >>>>>>>>>> unexpected >>>>>>>>>>> - >>>>>>>>>>>>> SUCCESS >>>>>>>>>>>>> >>>>>>>>>>>>> 2. Testing Larger Setups >>>>>>>>>>>>> >>>>>>>>>>>>> Cluster Environment:7 nodes, jm 1024m, tm 4096m >>>>>>>>>>>>> Testing Jobs: WordCount(Batch&Streaming), >>>>>>>>>>> DataStreamAllroundTestProgram >>>>>>>>>>>>> >>>>>>>>>>>>> 1). Use local&hdfs file systems for checkpoints - >>> SUCCESS >>>>>>>>>>>>> 2). Use hdfs file systems for input/output -SUCCESS >>>>>>>>>>>>> 3). Run examples on YARN(with or without session) - >>>>> SUCCESS >>>>>>>>>>>>> 4). Test failover and recovery. - SUCCESS >>>>>>>>>>>>> 5). Test incremental&non-incremental checkpoint - >>> SUCCESS >>>>>>>>>>>>> 6). Test connector - kafka -SUCCESS >>>>>>>>>>>>> >>>>>>>>>>>>> 3. Testing Functionality >>>>>>>>>>>>> >>>>>>>>>>>>> 1). Built-in tests(linux&mac os) >>>>>>>>>>>>> - `mvn cealn verify` (some test timeout error and test >>>>> case >>>>>>>>> bug >>>>>>>>>>> see >>>>>>>>>>>>> FLINK-12001 < >>>>> https://issues.apache.org/jira/browse/FLINK-12001>, >>>>>>>>> all >>>>>>>>>>> of >>>>>>>>>>>>> them are not the blocker) >>>>>>>>>>>>> - build for scala 2.11(mvn clean install -P scala-2.11 >>>>>>>>>>> -DskipTests) >>>>>>>>>>>>> - SUCCESS >>>>>>>>>>>>> - Run the scripted nightly end-to-end test - SUCCESS >>>>>>>>>>>>> >>>>>>>>>>>>> 2). Quickstarts >>>>>>>>>>>>> - Verify that the quickstarts for Scala with the >> staging >>>>>>>>>>> repository >>>>>>>>>>>>> in IntelliJ - SUCCESS >>>>>>>>>>>>> - Verify that the quickstarts for Java with the >> staging >>>>>>>>>> repository >>>>>>>>>>>> in >>>>>>>>>>>>> IntelliJ - SUCCESS >>>>>>>>>>>>> >>>>>>>>>>>>> 3). Simple Starter Experience and Use Cases >>>>>>>>>>>>> >>>>>>>>>>>>> - run all examples from IntelliJ IDE - SUCCESS >>>>>>>>>>>>> - Start a local cluster and verify that the processes >> - >>>>>>>>> SUCCESS >>>>>>>>>>>>> a. Examine the *.out files (should be empty) and the >> log >>>>>>>>> files >>>>>>>>>>>>> (should contain no exceptions) >>>>>>>>>>>>> b. Test for Linux, MacOS >>>>>>>>>>>>> c. Shutdown and verify there are no exceptions in the >>> log >>>>>>>>>> output >>>>>>>>>>>>> (after shutdown) >>>>>>>>>>>>> >>>>>>>>>>>>> - Verify that the examples are running from both >>>>> ./bin/flink >>>>>>>>> and >>>>>>>>>>>> from >>>>>>>>>>>>> the web-based job submission tool(following items) - >>>>> SUCCESS >>>>>>>>>>>>> a. Start multiple task managers in the local cluster >>>>>>>>>>>>> b. Change the flink-conf.yml to define more than one >>> task >>>>>>>>> slot >>>>>>>>>>> (2) >>>>>>>>>>>>> c. Run the examples with a parallelism > 1 >>>>>>>>>>>>> d. Examine the log output - no error messages should >> be >>>>>>>>>>>> encountered >>>>>>>>>>>>> >>>>>>>>>>>>> 4. Review the PR >>>>>>>>>>>>> - [Add 1.8 Release Blog Post] - Just a reminder, >> updated >>>>> the >>>>>>>>>>> release >>>>>>>>>>>>> date to correct date before merging. >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> Jincheng >>>>>>>>>>>>> >>>>>>>>>>>>> Piotr Nowojski <pi...@ververica.com> 于2019年3月25日周一 >>>>> 下午4:11写道: >>>>>>>>>>>>> >>>>>>>>>>>>>> +1 from my side. Previously spotted performance >>>>> regression seems >>>>>>>>> to >>>>>>>>>> be >>>>>>>>>>>>>> gone, or mostly gone. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Piotrek >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 21 Mar 2019, at 17:52, Aljoscha Krettek < >>>>>>>> aljos...@apache.org> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>> Please review and vote on the release candidate 4 >>> for >>>>> Flink >>>>>>>>> 1.8.0, >>>>>>>>>> as >>>>>>>>>>>>>> follows: >>>>>>>>>>>>>>> [ ] +1, Approve the release >>>>>>>>>>>>>>> [ ] -1, Do not approve the release (please provide >>>>> specific >>>>>>>>>> comments) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The complete staging area is available for your >>>>> review, which >>>>>>>>>>> includes: >>>>>>>>>>>>>>> * JIRA release notes [1], >>>>>>>>>>>>>>> * the official Apache source release and binary >>>>> convenience >>>>>>>>>> releases >>>>>>>>>>> to >>>>>>>>>>>>>> be deployed to dist.apache.org [2], which are >> signed >>>>> with the >>>>>>>> key >>>>>>>>>>> with >>>>>>>>>>>>>> fingerprint F2A67A8047499BBB3908D17AA8F4FD97121D7293 >>>>> [3], >>>>>>>>>>>>>>> * all artifacts to be deployed to the Maven >> Central >>>>> Repository >>>>>>>>> [4], >>>>>>>>>>>>>>> * source code tag "release-1.8.0-rc4" [5], >>>>>>>>>>>>>>> * website pull request listing the new release [6] >>>>>>>>>>>>>>> * website pull request adding announcement blog >> post >>>>> [7]. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The vote will be open for at least 72 hours. It is >>>>> adopted by >>>>>>>>>>> majority >>>>>>>>>>>>>> approval, with at least 3 PMC affirmative votes. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Aljoscha >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>> >>> >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344274 >>>>>>>>>>>>>>> [2] >>>>>>>>> https://dist.apache.org/repos/dist/dev/flink/flink-1.8.0-rc4/ >>>>>>>>>>>>>>> [3] >>>>> https://dist.apache.org/repos/dist/release/flink/KEYS >>>>>>>>>>>>>>> [4] >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>> >> https://repository.apache.org/content/repositories/orgapacheflink-1215 >>>>>>>>>>>>>>> [5] >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>> >>> >> https://gitbox.apache.org/repos/asf?p=flink.git;a=tag;h=c650befc10c8bb6cc4b007ae250b7b2173046145 >>>>>>>>>>>>>>> [6] https://github.com/apache/flink-web/pull/180 >> < >>>>>>>>>>>>>> https://github.com/apache/flink-web/pull/180> >>>>>>>>>>>>>>> [7] https://github.com/apache/flink-web/pull/179 >> < >>>>>>>>>>>>>> https://github.com/apache/flink-web/pull/179> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> P.S. The difference to the previous RCs is small, >>> you >>>>> can fetch >>>>>>>>> the >>>>>>>>>>>> tags >>>>>>>>>>>>>> and do a "git log >>> release-1.8.0-rc1..release-1.8.0-rc4” >>>>> to see >>>>>>>> the >>>>>>>>>>>>>> difference in commits. Its fixes for the issues that >>>>> led to the >>>>>>>>>>>>>> cancellation of the previous RCs plus smaller fixes. >>>>> Most >>>>>>>>>>>>>> verification/testing that was carried out should >> apply >>>>> as is to >>>>>>>>> this >>>>>>>>>>> RC. >>>>>>>>>>>>>> Any functional verification that you did on previous >>>>> RCs should >>>>>>>>>>>> therefore >>>>>>>>>>>>>> easily carry over to this one. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>> >>>>> >>>> >>> >>