-1 (non-binding) - Ran integration tests locally (1000+) of our flink job, all succeeded. - Attempted to run job on hadoop, failed. It failed because we have a firewall in place and we cannot set the rest port to a specific port/port range. Unless I am mistaken, it seems like FLINK-11081 broke the possibility of setting a REST port when running on yarn ( https://github.com/apache/flink/commit/730eed71ef3f718d61f85d5e94b1060844ca56db#diff-487838863ab693af7008f04cb3359be3R102 ) Code-wise it seems rather straightforward to fix but I am unsure about the reason why this is hard-coded to 0 and what the impact would be.
It would benefit us greatly if a fix for this could make it to 1.8.0. Regards, Richard On Thu, Mar 28, 2019 at 9:54 AM Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: > +1 (binding) > > Functional checks: > > - Built Flink from source (`mvn clean verify`) locally, with success > - Ran end-to-end tests locally for 5 times in a loop, no attempts failed > (Hadoop 2.8.4, Scala 2.12) > - Manually tested state schema evolution for POJO. Besides the tests that > @Congxian already did, additionally tested evolution cases with POJO > subclasses + non-registered POJOs. > - Manually tested migration of Scala stateful jobs that use case classes / > Scala collections as state types, performing the migration across Scala > 2.11 to Scala 2.12. > - Reviewed release announcement PR > > Misc / legal checks: > > - checked checksums and signatures > - No binaries in source distribution > - Staging area does not seem to have any missing artifacts > > Cheers, > Gordon > > On Thu, Mar 28, 2019 at 4:52 PM Tzu-Li (Gordon) Tai <tzuli...@apache.org> > wrote: > > > @Shaoxuan > > > > The drop in the serializerAvro benchmark, as explained earlier in > previous > > voting threads of earlier RCs, was due to a slower job initialization > phase > > caused by slower deserialization of the AvroSerializer. > > Piotr also pointed out that after the number of records was increased in > > the serializer benchmarks, this drop was no longer observable before / > > after the changes in mid February. > > IMO, this is not critical as it does not affect the per-record > performance > > / throughput, and therefore should not block this release. > > > > On Thu, Mar 28, 2019 at 1:08 AM Aljoscha Krettek <aljos...@fastmail.com> > > wrote: > > > >> By now, I'm reasonably sure that the test instabilities on the > end-to-end > >> test are only instabilities. I pushed changes to increase timeouts to > make > >> the tests more stable. As in any project, there will always be bugs but > I > >> think we could release this RC4 and be reasonably sure that it works > well. > >> > >> Now, we only need to have the required number of PMC votes. > >> > >> On Wed, Mar 27, 2019, at 07:22, Congxian Qiu wrote: > >> > +1 (non-binding) > >> > > >> > • checked signature and checksum ok > >> > • mvn clean package -DskipTests ok > >> > • Run job on yarn ok > >> > • Test state migration with POJO type (both heap and rocksdb) ok > >> > • - 1.6 -> 1.8 > >> > • - 1.7 -> 1.8 > >> > • - 1.8 -> 1.8 > >> > > >> > > >> > Best, Congxian > >> > On Mar 27, 2019, 10:26 +0800, vino yang <yanghua1...@gmail.com>, > wrote: > >> > > +1 (non-binding) > >> > > > >> > > - checked JIRA release note > >> > > - ran "mvn package -DskipTests" > >> > > - checked signature and checksum > >> > > - started a cluster locally and ran some examples in binary > >> > > - checked web site announcement's PR > >> > > > >> > > Best, > >> > > Vino > >> > > > >> > > > >> > > Xiaowei Jiang <xiaow...@gmail.com> 于2019年3月26日周二 下午8:20写道: > >> > > > >> > > > +1 (non-binding) > >> > > > > >> > > > - checked checksums and GPG files > >> > > > - build from source successfully- run end-to-end precommit tests > >> > > > successfully- run end-to-end nightly tests successfully > >> > > > Xiaowei > >> > > > On Tuesday, March 26, 2019, 8:09:19 PM GMT+8, Yu Li < > >> car...@gmail.com> > >> > > > wrote: > >> > > > > >> > > > +1 (non-binding) > >> > > > > >> > > > - Checked release notes: OK > >> > > > - Checked sums and signatures: OK > >> > > > - Source release > >> > > > - contains no binaries: OK > >> > > > - contains no 1.8-SNAPSHOT references: OK > >> > > > - build from source: OK (8u101) > >> > > > - mvn clean verify: OK (8u101) > >> > > > - Binary release > >> > > > - no examples appear to be missing > >> > > > - started a cluster; WebUI reachable, example ran successfully > >> > > > - end-to-end test (all but K8S and docker ones): OK (8u101) > >> > > > - Repository appears to contain all expected artifacts > >> > > > > >> > > > Best Regards, > >> > > > Yu > >> > > > > >> > > > > >> > > > On Tue, 26 Mar 2019 at 14:28, Kurt Young <ykt...@gmail.com> > wrote: > >> > > > > >> > > > > +1 (non-binding) > >> > > > > > >> > > > > Checked items: > >> > > > > - checked checksums and GPG files > >> > > > > - verified that the source archives do not contains any binaries > >> > > > > - checked that all POM files point to the same version > >> > > > > - build from source successfully > >> > > > > > >> > > > > Best, > >> > > > > Kurt > >> > > > > > >> > > > > > >> > > > > On Tue, Mar 26, 2019 at 10:57 AM Shaoxuan Wang < > >> wshaox...@gmail.com> > >> > > > > wrote: > >> > > > > > >> > > > > > +1 (non-binding) > >> > > > > > > >> > > > > > I tested RC4 with the following items: > >> > > > > > - Maven Central Repository contains all artifacts > >> > > > > > - Built the source with Maven (ensured all source files have > >> Apache > >> > > > > > headers), and executed built-in tests via "mvn clean verify" > >> > > > > > - Manually executed the tests in IntelliJ IDE > >> > > > > > - Verify that the quickstarts for Scala and Java are working > >> with the > >> > > > > > staging repository in IntelliJ > >> > > > > > - Checked the benchmark results. The perf regression of > >> > > > > > tuple-key-by/statebackend/tumblingWindow are gone, but the > >> regression > >> > > > on > >> > > > > > serializer still exists. > >> > > > > > > >> > > > > > Regards, > >> > > > > > Shaoxuan > >> > > > > > > >> > > > > > On Tue, Mar 26, 2019 at 8:06 AM jincheng sun < > >> sunjincheng...@gmail.com > >> > > > > > >> > > > > > wrote: > >> > > > > > > >> > > > > > > Hi Aljoscha, I think you are right, increase the timeout > >> config will > >> > > > > fix > >> > > > > > > this issue. this depends on the resource of Travis. I would > >> like > >> > > > share > >> > > > > > > some phenomenon during my test (not the flink problem) as > >> follows: > >> > > > :-) > >> > > > > > > > >> > > > > > > During my testing, `mvn clean verify` and `nightly > end-to-end > >> test ` > >> > > > > both > >> > > > > > > consume a lot of machine resources (especially > >> memory/network), and > >> > > > the > >> > > > > > > network bandwidth requirements of `nightly end-to-end test ` > >> are also > >> > > > > > very > >> > > > > > > high. In China, need to use VPN acceleration (100~200Kb > before > >> > > > > > > acceleration, 3~4Mb after acceleration), I have encountered: > >> [Avro > >> > > > > > > Confluent Schema Registry nightly end-to-end test' failed > >> after 18 > >> > > > > > minutes > >> > > > > > > and 15 seconds! Test exited with exit Code 1] takes more > than > >> 18 > >> > > > > minutes, > >> > > > > > > the download failed because the network bandwidth is not > >> enough. and > >> > > > it > >> > > > > > > runs smoothly when using VPN acceleration. The overall > >> end-to-end run > >> > > > > was > >> > > > > > > passed twice. The Docker resource configuration (CUPs 7, > Mem: > >> 28.7G, > >> > > > > > Swap: > >> > > > > > > 3.5G). See detail log here > >> > > > > > > < > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > https://docs.google.com/document/d/1CcyTCyZmMmP57pkKv4drjSuxW61_u78HR3q1fJJODMw/edit?usp=sharing > >> > > > > > > > > >> > > > > > > . > >> > > > > > > > >> > > > > > > Just now, I had checked the Travis for your last commit > >> (Increase > >> > > > > startup > >> > > > > > > timeout in end-to-end tests), in addition to the Cleanup > >> phase, other > >> > > > > > > phases are successful. here > >> > > > > > > <https://travis-ci.org/apache/flink/builds/511071777> > >> > > > > > > > >> > > > > > > In order to verify that our speculation is accurate, I can > >> help with > >> > > > 10 > >> > > > > > and > >> > > > > > > 20 seconds timeout config on my repo verification to see if > >> 100% > >> > > > > > recurring > >> > > > > > > timeout problem. It is already running, we are waiting for > the > >> > > > result. > >> > > > > > > 10seconds < > >> > > > https://travis-ci.org/sunjincheng121/flink/builds/511235749 > >> > > > > > > >> > > > > > > 20seconds < > >> > > > https://travis-ci.org/sunjincheng121/flink/builds/511235598 > >> > > > > > > >> > > > > > > > >> > > > > > > Best, > >> > > > > > > Jincheng > >> > > > > > > > >> > > > > > > Aljoscha Krettek <aljos...@apache.org> 于2019年3月26日周二 > >> 上午1:04写道: > >> > > > > > > > >> > > > > > > > Thanks for the testing done so far! > >> > > > > > > > > >> > > > > > > > There has been quite some flakiness on Travis lately, see > >> here: > >> > > > > > > > https://travis-ci.org/apache/flink/branches < > >> > > > > > > > https://travis-ci.org/apache/flink/branches>. I’m a bit > >> hesitant > >> > > > to > >> > > > > > > > release in this state. Looking at the tests you can see > >> that all of > >> > > > > the > >> > > > > > > > end-to-end tests fail because waiting for the dispatcher > to > >> come up > >> > > > > > times > >> > > > > > > > out. I also noticed that this usually takes about 5-8 > >> seconds on > >> > > > > > Travis, > >> > > > > > > so > >> > > > > > > > a 10 second timeout might be a bit low. I pushed commits > to > >> > > > increase > >> > > > > > that > >> > > > > > > > to 20 secs. Let’s see what will happen. > >> > > > > > > > > >> > > > > > > > I’ll keep you posted! > >> > > > > > > > Aljoscha > >> > > > > > > > > >> > > > > > > > > On 25. Mar 2019, at 13:13, jincheng sun < > >> > > > sunjincheng...@gmail.com> > >> > > > > > > > wrote: > >> > > > > > > > > > >> > > > > > > > > Great thanks for preparing the RC4 of Flink 1.8.0, > >> Aljoscha! > >> > > > > > > > > > >> > > > > > > > > +1 (non-binding) > >> > > > > > > > > > >> > > > > > > > > I checked the functional things as follows(Without > >> performance > >> > > > > > > > > verification): > >> > > > > > > > > > >> > > > > > > > > 1. Checking Artifacts: > >> > > > > > > > > > >> > > > > > > > > 1). Download the release source code - SUCCESS > >> > > > > > > > > 2). Check Source release flink-1.8.0-src.tgz.sha512 - > >> SUCCESS > >> > > > > > > > > 3). Download the released JAR - SUCCESS > >> > > > > > > > > 4). Check if checksums and GPG files match the > >> corresponding > >> > > > > > release > >> > > > > > > > > files - SUCCESS. > >> > > > > > > > > 5). Verify that the source archives do not contain any > >> > > > binaries > >> > > > > - > >> > > > > > > > > SUCCESS. > >> > > > > > > > > 6). Build the source with `mvn clean verify -DskipTests` > >> to > >> > > > > ensure > >> > > > > > > all > >> > > > > > > > > source files have Apache headers - SUCCESS > >> > > > > > > > > 7). Check that all POM files point to the same version - > >> > > > SUCCESS > >> > > > > > > > > 8). Read the `README.md` file to ensure there is nothing > >> > > > > > unexpected > >> > > > > > > - > >> > > > > > > > > SUCCESS > >> > > > > > > > > > >> > > > > > > > > 2. Testing Larger Setups > >> > > > > > > > > > >> > > > > > > > > Cluster Environment:7 nodes, jm 1024m, tm 4096m > >> > > > > > > > > Testing Jobs: WordCount(Batch&Streaming), > >> > > > > > > DataStreamAllroundTestProgram > >> > > > > > > > > > >> > > > > > > > > 1). Use local&hdfs file systems for checkpoints - > SUCCESS > >> > > > > > > > > 2). Use hdfs file systems for input/output -SUCCESS > >> > > > > > > > > 3). Run examples on YARN(with or without session) - > >> SUCCESS > >> > > > > > > > > 4). Test failover and recovery. - SUCCESS > >> > > > > > > > > 5). Test incremental&non-incremental checkpoint - > SUCCESS > >> > > > > > > > > 6). Test connector - kafka -SUCCESS > >> > > > > > > > > > >> > > > > > > > > 3. Testing Functionality > >> > > > > > > > > > >> > > > > > > > > 1). Built-in tests(linux&mac os) > >> > > > > > > > > - `mvn cealn verify` (some test timeout error and test > >> case > >> > > > > bug > >> > > > > > > see > >> > > > > > > > > FLINK-12001 < > >> https://issues.apache.org/jira/browse/FLINK-12001>, > >> > > > > all > >> > > > > > > of > >> > > > > > > > > them are not the blocker) > >> > > > > > > > > - build for scala 2.11(mvn clean install -P scala-2.11 > >> > > > > > > -DskipTests) > >> > > > > > > > > - SUCCESS > >> > > > > > > > > - Run the scripted nightly end-to-end test - SUCCESS > >> > > > > > > > > > >> > > > > > > > > 2). Quickstarts > >> > > > > > > > > - Verify that the quickstarts for Scala with the staging > >> > > > > > > repository > >> > > > > > > > > in IntelliJ - SUCCESS > >> > > > > > > > > - Verify that the quickstarts for Java with the staging > >> > > > > > repository > >> > > > > > > > in > >> > > > > > > > > IntelliJ - SUCCESS > >> > > > > > > > > > >> > > > > > > > > 3). Simple Starter Experience and Use Cases > >> > > > > > > > > > >> > > > > > > > > - run all examples from IntelliJ IDE - SUCCESS > >> > > > > > > > > - Start a local cluster and verify that the processes - > >> > > > > SUCCESS > >> > > > > > > > > a. Examine the *.out files (should be empty) and the log > >> > > > > files > >> > > > > > > > > (should contain no exceptions) > >> > > > > > > > > b. Test for Linux, MacOS > >> > > > > > > > > c. Shutdown and verify there are no exceptions in the > log > >> > > > > > output > >> > > > > > > > > (after shutdown) > >> > > > > > > > > > >> > > > > > > > > - Verify that the examples are running from both > >> ./bin/flink > >> > > > > and > >> > > > > > > > from > >> > > > > > > > > the web-based job submission tool(following items) - > >> SUCCESS > >> > > > > > > > > a. Start multiple task managers in the local cluster > >> > > > > > > > > b. Change the flink-conf.yml to define more than one > task > >> > > > > slot > >> > > > > > > (2) > >> > > > > > > > > c. Run the examples with a parallelism > 1 > >> > > > > > > > > d. Examine the log output - no error messages should be > >> > > > > > > > encountered > >> > > > > > > > > > >> > > > > > > > > 4. Review the PR > >> > > > > > > > > - [Add 1.8 Release Blog Post] - Just a reminder, updated > >> the > >> > > > > > > release > >> > > > > > > > > date to correct date before merging. > >> > > > > > > > > > >> > > > > > > > > Cheers, > >> > > > > > > > > Jincheng > >> > > > > > > > > > >> > > > > > > > > Piotr Nowojski <pi...@ververica.com> 于2019年3月25日周一 > >> 下午4:11写道: > >> > > > > > > > > > >> > > > > > > > > > +1 from my side. Previously spotted performance > >> regression seems > >> > > > > to > >> > > > > > be > >> > > > > > > > > > gone, or mostly gone. > >> > > > > > > > > > > >> > > > > > > > > > Piotrek > >> > > > > > > > > > > >> > > > > > > > > > > On 21 Mar 2019, at 17:52, Aljoscha Krettek < > >> > > > aljos...@apache.org> > >> > > > > > > > wrote: > >> > > > > > > > > > > > >> > > > > > > > > > > Hi everyone, > >> > > > > > > > > > > Please review and vote on the release candidate 4 > for > >> Flink > >> > > > > 1.8.0, > >> > > > > > as > >> > > > > > > > > > follows: > >> > > > > > > > > > > [ ] +1, Approve the release > >> > > > > > > > > > > [ ] -1, Do not approve the release (please provide > >> specific > >> > > > > > comments) > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > The complete staging area is available for your > >> review, which > >> > > > > > > includes: > >> > > > > > > > > > > * JIRA release notes [1], > >> > > > > > > > > > > * the official Apache source release and binary > >> convenience > >> > > > > > releases > >> > > > > > > to > >> > > > > > > > > > be deployed to dist.apache.org [2], which are signed > >> with the > >> > > > key > >> > > > > > > with > >> > > > > > > > > > fingerprint F2A67A8047499BBB3908D17AA8F4FD97121D7293 > >> [3], > >> > > > > > > > > > > * all artifacts to be deployed to the Maven Central > >> Repository > >> > > > > [4], > >> > > > > > > > > > > * source code tag "release-1.8.0-rc4" [5], > >> > > > > > > > > > > * website pull request listing the new release [6] > >> > > > > > > > > > > * website pull request adding announcement blog post > >> [7]. > >> > > > > > > > > > > > >> > > > > > > > > > > The vote will be open for at least 72 hours. It is > >> adopted by > >> > > > > > > majority > >> > > > > > > > > > approval, with at least 3 PMC affirmative votes. > >> > > > > > > > > > > > >> > > > > > > > > > > Thanks, > >> > > > > > > > > > > Aljoscha > >> > > > > > > > > > > > >> > > > > > > > > > > [1] > >> > > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344274 > >> > > > > > > > > > > [2] > >> > > > > https://dist.apache.org/repos/dist/dev/flink/flink-1.8.0-rc4/ > >> > > > > > > > > > > [3] > >> https://dist.apache.org/repos/dist/release/flink/KEYS > >> > > > > > > > > > > [4] > >> > > > > > > > > > > >> > > > > > > > >> > > > > >> https://repository.apache.org/content/repositories/orgapacheflink-1215 > >> > > > > > > > > > > [5] > >> > > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > https://gitbox.apache.org/repos/asf?p=flink.git;a=tag;h=c650befc10c8bb6cc4b007ae250b7b2173046145 > >> > > > > > > > > > > [6] https://github.com/apache/flink-web/pull/180 < > >> > > > > > > > > > https://github.com/apache/flink-web/pull/180> > >> > > > > > > > > > > [7] https://github.com/apache/flink-web/pull/179 < > >> > > > > > > > > > https://github.com/apache/flink-web/pull/179> > >> > > > > > > > > > > > >> > > > > > > > > > > P.S. The difference to the previous RCs is small, > you > >> can fetch > >> > > > > the > >> > > > > > > > tags > >> > > > > > > > > > and do a "git log > release-1.8.0-rc1..release-1.8.0-rc4” > >> to see > >> > > > the > >> > > > > > > > > > difference in commits. Its fixes for the issues that > >> led to the > >> > > > > > > > > > cancellation of the previous RCs plus smaller fixes. > >> Most > >> > > > > > > > > > verification/testing that was carried out should apply > >> as is to > >> > > > > this > >> > > > > > > RC. > >> > > > > > > > > > Any functional verification that you did on previous > >> RCs should > >> > > > > > > > therefore > >> > > > > > > > > > easily carry over to this one. > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > >> > > >