Hi Aljoscha, I think you are right, increase the timeout config will fix this issue. this depends on the resource of Travis. I would like share some phenomenon during my test (not the flink problem) as follows: :-)
During my testing, `mvn clean verify` and `nightly end-to-end test ` both consume a lot of machine resources (especially memory/network), and the network bandwidth requirements of `nightly end-to-end test ` are also very high. In China, need to use VPN acceleration (100~200Kb before acceleration, 3~4Mb after acceleration), I have encountered: [Avro Confluent Schema Registry nightly end-to-end test' failed after 18 minutes and 15 seconds! Test exited with exit Code 1] takes more than 18 minutes, the download failed because the network bandwidth is not enough. and it runs smoothly when using VPN acceleration. The overall end-to-end run was passed twice. The Docker resource configuration (CUPs 7, Mem: 28.7G, Swap: 3.5G). See detail log here <https://docs.google.com/document/d/1CcyTCyZmMmP57pkKv4drjSuxW61_u78HR3q1fJJODMw/edit?usp=sharing> . Just now, I had checked the Travis for your last commit (Increase startup timeout in end-to-end tests), in addition to the Cleanup phase, other phases are successful. here <https://travis-ci.org/apache/flink/builds/511071777> In order to verify that our speculation is accurate, I can help with 10 and 20 seconds timeout config on my repo verification to see if 100% recurring timeout problem. It is already running, we are waiting for the result. 10seconds <https://travis-ci.org/sunjincheng121/flink/builds/511235749> 20seconds <https://travis-ci.org/sunjincheng121/flink/builds/511235598> Best, Jincheng Aljoscha Krettek <aljos...@apache.org> 于2019年3月26日周二 上午1:04写道: > Thanks for the testing done so far! > > There has been quite some flakiness on Travis lately, see here: > https://travis-ci.org/apache/flink/branches < > https://travis-ci.org/apache/flink/branches>. I’m a bit hesitant to > release in this state. Looking at the tests you can see that all of the > end-to-end tests fail because waiting for the dispatcher to come up times > out. I also noticed that this usually takes about 5-8 seconds on Travis, so > a 10 second timeout might be a bit low. I pushed commits to increase that > to 20 secs. Let’s see what will happen. > > I’ll keep you posted! > Aljoscha > > > On 25. Mar 2019, at 13:13, jincheng sun <sunjincheng...@gmail.com> > wrote: > > > > Great thanks for preparing the RC4 of Flink 1.8.0, Aljoscha! > > > > +1 (non-binding) > > > > I checked the functional things as follows(Without performance > > verification): > > > > 1. Checking Artifacts: > > > > 1). Download the release source code - SUCCESS > > 2). Check Source release flink-1.8.0-src.tgz.sha512 - SUCCESS > > 3). Download the released JAR - SUCCESS > > 4). Check if checksums and GPG files match the corresponding release > > files - SUCCESS. > > 5). Verify that the source archives do not contain any binaries - > > SUCCESS. > > 6). Build the source with `mvn clean verify -DskipTests` to ensure all > > source files have Apache headers - SUCCESS > > 7). Check that all POM files point to the same version - SUCCESS > > 8). Read the `README.md` file to ensure there is nothing unexpected - > > SUCCESS > > > > 2. Testing Larger Setups > > > > Cluster Environment:7 nodes, jm 1024m, tm 4096m > > Testing Jobs: WordCount(Batch&Streaming), DataStreamAllroundTestProgram > > > > 1). Use local&hdfs file systems for checkpoints - SUCCESS > > 2). Use hdfs file systems for input/output -SUCCESS > > 3). Run examples on YARN(with or without session) - SUCCESS > > 4). Test failover and recovery. - SUCCESS > > 5). Test incremental&non-incremental checkpoint - SUCCESS > > 6). Test connector - kafka -SUCCESS > > > > 3. Testing Functionality > > > > 1). Built-in tests(linux&mac os) > > - `mvn cealn verify` (some test timeout error and test case bug see > > FLINK-12001 <https://issues.apache.org/jira/browse/FLINK-12001>, all of > > them are not the blocker) > > - build for scala 2.11(mvn clean install -P scala-2.11 -DskipTests) > > - SUCCESS > > - Run the scripted nightly end-to-end test - SUCCESS > > > > 2). Quickstarts > > - Verify that the quickstarts for Scala with the staging repository > > in IntelliJ - SUCCESS > > - Verify that the quickstarts for Java with the staging repository > in > > IntelliJ - SUCCESS > > > > 3). Simple Starter Experience and Use Cases > > > > - run all examples from IntelliJ IDE - SUCCESS > > - Start a local cluster and verify that the processes - SUCCESS > > a. Examine the *.out files (should be empty) and the log files > > (should contain no exceptions) > > b. Test for Linux, MacOS > > c. Shutdown and verify there are no exceptions in the log output > > (after shutdown) > > > > - Verify that the examples are running from both ./bin/flink and > from > > the web-based job submission tool(following items) - SUCCESS > > a. Start multiple task managers in the local cluster > > b. Change the flink-conf.yml to define more than one task slot (2) > > c. Run the examples with a parallelism > 1 > > d. Examine the log output - no error messages should be > encountered > > > > 4. Review the PR > > - [Add 1.8 Release Blog Post] - Just a reminder, updated the release > > date to correct date before merging. > > > > Cheers, > > Jincheng > > > > Piotr Nowojski <pi...@ververica.com> 于2019年3月25日周一 下午4:11写道: > > > >> +1 from my side. Previously spotted performance regression seems to be > >> gone, or mostly gone. > >> > >> Piotrek > >> > >>> On 21 Mar 2019, at 17:52, Aljoscha Krettek <aljos...@apache.org> > wrote: > >>> > >>> Hi everyone, > >>> Please review and vote on the release candidate 4 for Flink 1.8.0, as > >> follows: > >>> [ ] +1, Approve the release > >>> [ ] -1, Do not approve the release (please provide specific comments) > >>> > >>> > >>> The complete staging area is available for your review, which includes: > >>> * JIRA release notes [1], > >>> * the official Apache source release and binary convenience releases to > >> be deployed to dist.apache.org [2], which are signed with the key with > >> fingerprint F2A67A8047499BBB3908D17AA8F4FD97121D7293 [3], > >>> * all artifacts to be deployed to the Maven Central Repository [4], > >>> * source code tag "release-1.8.0-rc4" [5], > >>> * website pull request listing the new release [6] > >>> * website pull request adding announcement blog post [7]. > >>> > >>> The vote will be open for at least 72 hours. It is adopted by majority > >> approval, with at least 3 PMC affirmative votes. > >>> > >>> Thanks, > >>> Aljoscha > >>> > >>> [1] > >> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344274 > >>> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.8.0-rc4/ > >>> [3] https://dist.apache.org/repos/dist/release/flink/KEYS > >>> [4] > >> https://repository.apache.org/content/repositories/orgapacheflink-1215 > >>> [5] > >> > https://gitbox.apache.org/repos/asf?p=flink.git;a=tag;h=c650befc10c8bb6cc4b007ae250b7b2173046145 > >>> [6] https://github.com/apache/flink-web/pull/180 < > >> https://github.com/apache/flink-web/pull/180> > >>> [7] https://github.com/apache/flink-web/pull/179 < > >> https://github.com/apache/flink-web/pull/179> > >>> > >>> P.S. The difference to the previous RCs is small, you can fetch the > tags > >> and do a "git log release-1.8.0-rc1..release-1.8.0-rc4” to see the > >> difference in commits. Its fixes for the issues that led to the > >> cancellation of the previous RCs plus smaller fixes. Most > >> verification/testing that was carried out should apply as is to this RC. > >> Any functional verification that you did on previous RCs should > therefore > >> easily carry over to this one. > >> > >> > >