Hi @Aljoscha, Based on the previous commit [1] that adds the random port selection code, it seems like the important part is to unset whatever 'rest.port' setting previously done. I don't think the current way of setting the BIND_PORT actually overrides any existing PORT setting. However, I wasn't able to find any test that is related, maybe @Till can provide more insight here?
Maybe @Richard can provide more detail on the YARN run command used to reproduce the problem? Thanks, Rong [1] https://github.com/apache/flink/commit/dbe0e8286d76a5facdb49589b638b87dbde80178#diff-487838863ab693af7008f04cb3359be3R117 On Sat, Mar 30, 2019 at 5:51 AM Aljoscha Krettek <aljos...@apache.org> wrote: > @Richard Did this work for you previously? From the change, it seems that > the port was always set to 0 on YARN even before. > > > On 28. Mar 2019, at 16:13, Richard Deurwaarder <rich...@xeli.eu> wrote: > > > > -1 (non-binding) > > > > - Ran integration tests locally (1000+) of our flink job, all succeeded. > > - Attempted to run job on hadoop, failed. It failed because we have a > > firewall in place and we cannot set the rest port to a specific port/port > > range. > > Unless I am mistaken, it seems like FLINK-11081 broke the possibility of > > setting a REST port when running on yarn ( > > > https://github.com/apache/flink/commit/730eed71ef3f718d61f85d5e94b1060844ca56db#diff-487838863ab693af7008f04cb3359be3R102 > > ) > > Code-wise it seems rather straightforward to fix but I am unsure about > the > > reason why this is hard-coded to 0 and what the impact would be. > > > > It would benefit us greatly if a fix for this could make it to 1.8.0. > > > > Regards, > > > > Richard > > > > On Thu, Mar 28, 2019 at 9:54 AM Tzu-Li (Gordon) Tai <tzuli...@apache.org > > > > wrote: > > > >> +1 (binding) > >> > >> Functional checks: > >> > >> - Built Flink from source (`mvn clean verify`) locally, with success > >> - Ran end-to-end tests locally for 5 times in a loop, no attempts failed > >> (Hadoop 2.8.4, Scala 2.12) > >> - Manually tested state schema evolution for POJO. Besides the tests > that > >> @Congxian already did, additionally tested evolution cases with POJO > >> subclasses + non-registered POJOs. > >> - Manually tested migration of Scala stateful jobs that use case > classes / > >> Scala collections as state types, performing the migration across Scala > >> 2.11 to Scala 2.12. > >> - Reviewed release announcement PR > >> > >> Misc / legal checks: > >> > >> - checked checksums and signatures > >> - No binaries in source distribution > >> - Staging area does not seem to have any missing artifacts > >> > >> Cheers, > >> Gordon > >> > >> On Thu, Mar 28, 2019 at 4:52 PM Tzu-Li (Gordon) Tai < > tzuli...@apache.org> > >> wrote: > >> > >>> @Shaoxuan > >>> > >>> The drop in the serializerAvro benchmark, as explained earlier in > >> previous > >>> voting threads of earlier RCs, was due to a slower job initialization > >> phase > >>> caused by slower deserialization of the AvroSerializer. > >>> Piotr also pointed out that after the number of records was increased > in > >>> the serializer benchmarks, this drop was no longer observable before / > >>> after the changes in mid February. > >>> IMO, this is not critical as it does not affect the per-record > >> performance > >>> / throughput, and therefore should not block this release. > >>> > >>> On Thu, Mar 28, 2019 at 1:08 AM Aljoscha Krettek < > aljos...@fastmail.com> > >>> wrote: > >>> > >>>> By now, I'm reasonably sure that the test instabilities on the > >> end-to-end > >>>> test are only instabilities. I pushed changes to increase timeouts to > >> make > >>>> the tests more stable. As in any project, there will always be bugs > but > >> I > >>>> think we could release this RC4 and be reasonably sure that it works > >> well. > >>>> > >>>> Now, we only need to have the required number of PMC votes. > >>>> > >>>> On Wed, Mar 27, 2019, at 07:22, Congxian Qiu wrote: > >>>>> +1 (non-binding) > >>>>> > >>>>> • checked signature and checksum ok > >>>>> • mvn clean package -DskipTests ok > >>>>> • Run job on yarn ok > >>>>> • Test state migration with POJO type (both heap and rocksdb) ok > >>>>> • - 1.6 -> 1.8 > >>>>> • - 1.7 -> 1.8 > >>>>> • - 1.8 -> 1.8 > >>>>> > >>>>> > >>>>> Best, Congxian > >>>>> On Mar 27, 2019, 10:26 +0800, vino yang <yanghua1...@gmail.com>, > >> wrote: > >>>>>> +1 (non-binding) > >>>>>> > >>>>>> - checked JIRA release note > >>>>>> - ran "mvn package -DskipTests" > >>>>>> - checked signature and checksum > >>>>>> - started a cluster locally and ran some examples in binary > >>>>>> - checked web site announcement's PR > >>>>>> > >>>>>> Best, > >>>>>> Vino > >>>>>> > >>>>>> > >>>>>> Xiaowei Jiang <xiaow...@gmail.com> 于2019年3月26日周二 下午8:20写道: > >>>>>> > >>>>>>> +1 (non-binding) > >>>>>>> > >>>>>>> - checked checksums and GPG files > >>>>>>> - build from source successfully- run end-to-end precommit tests > >>>>>>> successfully- run end-to-end nightly tests successfully > >>>>>>> Xiaowei > >>>>>>> On Tuesday, March 26, 2019, 8:09:19 PM GMT+8, Yu Li < > >>>> car...@gmail.com> > >>>>>>> wrote: > >>>>>>> > >>>>>>> +1 (non-binding) > >>>>>>> > >>>>>>> - Checked release notes: OK > >>>>>>> - Checked sums and signatures: OK > >>>>>>> - Source release > >>>>>>> - contains no binaries: OK > >>>>>>> - contains no 1.8-SNAPSHOT references: OK > >>>>>>> - build from source: OK (8u101) > >>>>>>> - mvn clean verify: OK (8u101) > >>>>>>> - Binary release > >>>>>>> - no examples appear to be missing > >>>>>>> - started a cluster; WebUI reachable, example ran successfully > >>>>>>> - end-to-end test (all but K8S and docker ones): OK (8u101) > >>>>>>> - Repository appears to contain all expected artifacts > >>>>>>> > >>>>>>> Best Regards, > >>>>>>> Yu > >>>>>>> > >>>>>>> > >>>>>>> On Tue, 26 Mar 2019 at 14:28, Kurt Young <ykt...@gmail.com> > >> wrote: > >>>>>>> > >>>>>>>> +1 (non-binding) > >>>>>>>> > >>>>>>>> Checked items: > >>>>>>>> - checked checksums and GPG files > >>>>>>>> - verified that the source archives do not contains any binaries > >>>>>>>> - checked that all POM files point to the same version > >>>>>>>> - build from source successfully > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Kurt > >>>>>>>> > >>>>>>>> > >>>>>>>> On Tue, Mar 26, 2019 at 10:57 AM Shaoxuan Wang < > >>>> wshaox...@gmail.com> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> +1 (non-binding) > >>>>>>>>> > >>>>>>>>> I tested RC4 with the following items: > >>>>>>>>> - Maven Central Repository contains all artifacts > >>>>>>>>> - Built the source with Maven (ensured all source files have > >>>> Apache > >>>>>>>>> headers), and executed built-in tests via "mvn clean verify" > >>>>>>>>> - Manually executed the tests in IntelliJ IDE > >>>>>>>>> - Verify that the quickstarts for Scala and Java are working > >>>> with the > >>>>>>>>> staging repository in IntelliJ > >>>>>>>>> - Checked the benchmark results. The perf regression of > >>>>>>>>> tuple-key-by/statebackend/tumblingWindow are gone, but the > >>>> regression > >>>>>>> on > >>>>>>>>> serializer still exists. > >>>>>>>>> > >>>>>>>>> Regards, > >>>>>>>>> Shaoxuan > >>>>>>>>> > >>>>>>>>> On Tue, Mar 26, 2019 at 8:06 AM jincheng sun < > >>>> sunjincheng...@gmail.com > >>>>>>>> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Hi Aljoscha, I think you are right, increase the timeout > >>>> config will > >>>>>>>> fix > >>>>>>>>>> this issue. this depends on the resource of Travis. I would > >>>> like > >>>>>>> share > >>>>>>>>>> some phenomenon during my test (not the flink problem) as > >>>> follows: > >>>>>>> :-) > >>>>>>>>>> > >>>>>>>>>> During my testing, `mvn clean verify` and `nightly > >> end-to-end > >>>> test ` > >>>>>>>> both > >>>>>>>>>> consume a lot of machine resources (especially > >>>> memory/network), and > >>>>>>> the > >>>>>>>>>> network bandwidth requirements of `nightly end-to-end test ` > >>>> are also > >>>>>>>>> very > >>>>>>>>>> high. In China, need to use VPN acceleration (100~200Kb > >> before > >>>>>>>>>> acceleration, 3~4Mb after acceleration), I have encountered: > >>>> [Avro > >>>>>>>>>> Confluent Schema Registry nightly end-to-end test' failed > >>>> after 18 > >>>>>>>>> minutes > >>>>>>>>>> and 15 seconds! Test exited with exit Code 1] takes more > >> than > >>>> 18 > >>>>>>>> minutes, > >>>>>>>>>> the download failed because the network bandwidth is not > >>>> enough. and > >>>>>>> it > >>>>>>>>>> runs smoothly when using VPN acceleration. The overall > >>>> end-to-end run > >>>>>>>> was > >>>>>>>>>> passed twice. The Docker resource configuration (CUPs 7, > >> Mem: > >>>> 28.7G, > >>>>>>>>> Swap: > >>>>>>>>>> 3.5G). See detail log here > >>>>>>>>>> < > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>> > >> > https://docs.google.com/document/d/1CcyTCyZmMmP57pkKv4drjSuxW61_u78HR3q1fJJODMw/edit?usp=sharing > >>>>>>>>>>> > >>>>>>>>>> . > >>>>>>>>>> > >>>>>>>>>> Just now, I had checked the Travis for your last commit > >>>> (Increase > >>>>>>>> startup > >>>>>>>>>> timeout in end-to-end tests), in addition to the Cleanup > >>>> phase, other > >>>>>>>>>> phases are successful. here > >>>>>>>>>> <https://travis-ci.org/apache/flink/builds/511071777> > >>>>>>>>>> > >>>>>>>>>> In order to verify that our speculation is accurate, I can > >>>> help with > >>>>>>> 10 > >>>>>>>>> and > >>>>>>>>>> 20 seconds timeout config on my repo verification to see if > >>>> 100% > >>>>>>>>> recurring > >>>>>>>>>> timeout problem. It is already running, we are waiting for > >> the > >>>>>>> result. > >>>>>>>>>> 10seconds < > >>>>>>> https://travis-ci.org/sunjincheng121/flink/builds/511235749 > >>>>>>>>> > >>>>>>>>>> 20seconds < > >>>>>>> https://travis-ci.org/sunjincheng121/flink/builds/511235598 > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Best, > >>>>>>>>>> Jincheng > >>>>>>>>>> > >>>>>>>>>> Aljoscha Krettek <aljos...@apache.org> 于2019年3月26日周二 > >>>> 上午1:04写道: > >>>>>>>>>> > >>>>>>>>>>> Thanks for the testing done so far! > >>>>>>>>>>> > >>>>>>>>>>> There has been quite some flakiness on Travis lately, see > >>>> here: > >>>>>>>>>>> https://travis-ci.org/apache/flink/branches < > >>>>>>>>>>> https://travis-ci.org/apache/flink/branches>. I’m a bit > >>>> hesitant > >>>>>>> to > >>>>>>>>>>> release in this state. Looking at the tests you can see > >>>> that all of > >>>>>>>> the > >>>>>>>>>>> end-to-end tests fail because waiting for the dispatcher > >> to > >>>> come up > >>>>>>>>> times > >>>>>>>>>>> out. I also noticed that this usually takes about 5-8 > >>>> seconds on > >>>>>>>>> Travis, > >>>>>>>>>> so > >>>>>>>>>>> a 10 second timeout might be a bit low. I pushed commits > >> to > >>>>>>> increase > >>>>>>>>> that > >>>>>>>>>>> to 20 secs. Let’s see what will happen. > >>>>>>>>>>> > >>>>>>>>>>> I’ll keep you posted! > >>>>>>>>>>> Aljoscha > >>>>>>>>>>> > >>>>>>>>>>>> On 25. Mar 2019, at 13:13, jincheng sun < > >>>>>>> sunjincheng...@gmail.com> > >>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Great thanks for preparing the RC4 of Flink 1.8.0, > >>>> Aljoscha! > >>>>>>>>>>>> > >>>>>>>>>>>> +1 (non-binding) > >>>>>>>>>>>> > >>>>>>>>>>>> I checked the functional things as follows(Without > >>>> performance > >>>>>>>>>>>> verification): > >>>>>>>>>>>> > >>>>>>>>>>>> 1. Checking Artifacts: > >>>>>>>>>>>> > >>>>>>>>>>>> 1). Download the release source code - SUCCESS > >>>>>>>>>>>> 2). Check Source release flink-1.8.0-src.tgz.sha512 - > >>>> SUCCESS > >>>>>>>>>>>> 3). Download the released JAR - SUCCESS > >>>>>>>>>>>> 4). Check if checksums and GPG files match the > >>>> corresponding > >>>>>>>>> release > >>>>>>>>>>>> files - SUCCESS. > >>>>>>>>>>>> 5). Verify that the source archives do not contain any > >>>>>>> binaries > >>>>>>>> - > >>>>>>>>>>>> SUCCESS. > >>>>>>>>>>>> 6). Build the source with `mvn clean verify -DskipTests` > >>>> to > >>>>>>>> ensure > >>>>>>>>>> all > >>>>>>>>>>>> source files have Apache headers - SUCCESS > >>>>>>>>>>>> 7). Check that all POM files point to the same version - > >>>>>>> SUCCESS > >>>>>>>>>>>> 8). Read the `README.md` file to ensure there is nothing > >>>>>>>>> unexpected > >>>>>>>>>> - > >>>>>>>>>>>> SUCCESS > >>>>>>>>>>>> > >>>>>>>>>>>> 2. Testing Larger Setups > >>>>>>>>>>>> > >>>>>>>>>>>> Cluster Environment:7 nodes, jm 1024m, tm 4096m > >>>>>>>>>>>> Testing Jobs: WordCount(Batch&Streaming), > >>>>>>>>>> DataStreamAllroundTestProgram > >>>>>>>>>>>> > >>>>>>>>>>>> 1). Use local&hdfs file systems for checkpoints - > >> SUCCESS > >>>>>>>>>>>> 2). Use hdfs file systems for input/output -SUCCESS > >>>>>>>>>>>> 3). Run examples on YARN(with or without session) - > >>>> SUCCESS > >>>>>>>>>>>> 4). Test failover and recovery. - SUCCESS > >>>>>>>>>>>> 5). Test incremental&non-incremental checkpoint - > >> SUCCESS > >>>>>>>>>>>> 6). Test connector - kafka -SUCCESS > >>>>>>>>>>>> > >>>>>>>>>>>> 3. Testing Functionality > >>>>>>>>>>>> > >>>>>>>>>>>> 1). Built-in tests(linux&mac os) > >>>>>>>>>>>> - `mvn cealn verify` (some test timeout error and test > >>>> case > >>>>>>>> bug > >>>>>>>>>> see > >>>>>>>>>>>> FLINK-12001 < > >>>> https://issues.apache.org/jira/browse/FLINK-12001>, > >>>>>>>> all > >>>>>>>>>> of > >>>>>>>>>>>> them are not the blocker) > >>>>>>>>>>>> - build for scala 2.11(mvn clean install -P scala-2.11 > >>>>>>>>>> -DskipTests) > >>>>>>>>>>>> - SUCCESS > >>>>>>>>>>>> - Run the scripted nightly end-to-end test - SUCCESS > >>>>>>>>>>>> > >>>>>>>>>>>> 2). Quickstarts > >>>>>>>>>>>> - Verify that the quickstarts for Scala with the staging > >>>>>>>>>> repository > >>>>>>>>>>>> in IntelliJ - SUCCESS > >>>>>>>>>>>> - Verify that the quickstarts for Java with the staging > >>>>>>>>> repository > >>>>>>>>>>> in > >>>>>>>>>>>> IntelliJ - SUCCESS > >>>>>>>>>>>> > >>>>>>>>>>>> 3). Simple Starter Experience and Use Cases > >>>>>>>>>>>> > >>>>>>>>>>>> - run all examples from IntelliJ IDE - SUCCESS > >>>>>>>>>>>> - Start a local cluster and verify that the processes - > >>>>>>>> SUCCESS > >>>>>>>>>>>> a. Examine the *.out files (should be empty) and the log > >>>>>>>> files > >>>>>>>>>>>> (should contain no exceptions) > >>>>>>>>>>>> b. Test for Linux, MacOS > >>>>>>>>>>>> c. Shutdown and verify there are no exceptions in the > >> log > >>>>>>>>> output > >>>>>>>>>>>> (after shutdown) > >>>>>>>>>>>> > >>>>>>>>>>>> - Verify that the examples are running from both > >>>> ./bin/flink > >>>>>>>> and > >>>>>>>>>>> from > >>>>>>>>>>>> the web-based job submission tool(following items) - > >>>> SUCCESS > >>>>>>>>>>>> a. Start multiple task managers in the local cluster > >>>>>>>>>>>> b. Change the flink-conf.yml to define more than one > >> task > >>>>>>>> slot > >>>>>>>>>> (2) > >>>>>>>>>>>> c. Run the examples with a parallelism > 1 > >>>>>>>>>>>> d. Examine the log output - no error messages should be > >>>>>>>>>>> encountered > >>>>>>>>>>>> > >>>>>>>>>>>> 4. Review the PR > >>>>>>>>>>>> - [Add 1.8 Release Blog Post] - Just a reminder, updated > >>>> the > >>>>>>>>>> release > >>>>>>>>>>>> date to correct date before merging. > >>>>>>>>>>>> > >>>>>>>>>>>> Cheers, > >>>>>>>>>>>> Jincheng > >>>>>>>>>>>> > >>>>>>>>>>>> Piotr Nowojski <pi...@ververica.com> 于2019年3月25日周一 > >>>> 下午4:11写道: > >>>>>>>>>>>> > >>>>>>>>>>>>> +1 from my side. Previously spotted performance > >>>> regression seems > >>>>>>>> to > >>>>>>>>> be > >>>>>>>>>>>>> gone, or mostly gone. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Piotrek > >>>>>>>>>>>>> > >>>>>>>>>>>>>> On 21 Mar 2019, at 17:52, Aljoscha Krettek < > >>>>>>> aljos...@apache.org> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi everyone, > >>>>>>>>>>>>>> Please review and vote on the release candidate 4 > >> for > >>>> Flink > >>>>>>>> 1.8.0, > >>>>>>>>> as > >>>>>>>>>>>>> follows: > >>>>>>>>>>>>>> [ ] +1, Approve the release > >>>>>>>>>>>>>> [ ] -1, Do not approve the release (please provide > >>>> specific > >>>>>>>>> comments) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The complete staging area is available for your > >>>> review, which > >>>>>>>>>> includes: > >>>>>>>>>>>>>> * JIRA release notes [1], > >>>>>>>>>>>>>> * the official Apache source release and binary > >>>> convenience > >>>>>>>>> releases > >>>>>>>>>> to > >>>>>>>>>>>>> be deployed to dist.apache.org [2], which are signed > >>>> with the > >>>>>>> key > >>>>>>>>>> with > >>>>>>>>>>>>> fingerprint F2A67A8047499BBB3908D17AA8F4FD97121D7293 > >>>> [3], > >>>>>>>>>>>>>> * all artifacts to be deployed to the Maven Central > >>>> Repository > >>>>>>>> [4], > >>>>>>>>>>>>>> * source code tag "release-1.8.0-rc4" [5], > >>>>>>>>>>>>>> * website pull request listing the new release [6] > >>>>>>>>>>>>>> * website pull request adding announcement blog post > >>>> [7]. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The vote will be open for at least 72 hours. It is > >>>> adopted by > >>>>>>>>>> majority > >>>>>>>>>>>>> approval, with at least 3 PMC affirmative votes. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>> Aljoscha > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> [1] > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>> > >> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344274 > >>>>>>>>>>>>>> [2] > >>>>>>>> https://dist.apache.org/repos/dist/dev/flink/flink-1.8.0-rc4/ > >>>>>>>>>>>>>> [3] > >>>> https://dist.apache.org/repos/dist/release/flink/KEYS > >>>>>>>>>>>>>> [4] > >>>>>>>>>>>>> > >>>>>>>>>> > >>>>>>> > >>>> > https://repository.apache.org/content/repositories/orgapacheflink-1215 > >>>>>>>>>>>>>> [5] > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>> > >> > https://gitbox.apache.org/repos/asf?p=flink.git;a=tag;h=c650befc10c8bb6cc4b007ae250b7b2173046145 > >>>>>>>>>>>>>> [6] https://github.com/apache/flink-web/pull/180 < > >>>>>>>>>>>>> https://github.com/apache/flink-web/pull/180> > >>>>>>>>>>>>>> [7] https://github.com/apache/flink-web/pull/179 < > >>>>>>>>>>>>> https://github.com/apache/flink-web/pull/179> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> P.S. The difference to the previous RCs is small, > >> you > >>>> can fetch > >>>>>>>> the > >>>>>>>>>>> tags > >>>>>>>>>>>>> and do a "git log > >> release-1.8.0-rc1..release-1.8.0-rc4” > >>>> to see > >>>>>>> the > >>>>>>>>>>>>> difference in commits. Its fixes for the issues that > >>>> led to the > >>>>>>>>>>>>> cancellation of the previous RCs plus smaller fixes. > >>>> Most > >>>>>>>>>>>>> verification/testing that was carried out should apply > >>>> as is to > >>>>>>>> this > >>>>>>>>>> RC. > >>>>>>>>>>>>> Any functional verification that you did on previous > >>>> RCs should > >>>>>>>>>>> therefore > >>>>>>>>>>>>> easily carry over to this one. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>> > >>>> > >>> > >> > >