Hello Gary,

Thank you for your response.

I'd like to use the new mode but it does not work for me. It seems I am
running into a firewall issue.

Because the rest.port is random when running on yarn[1]. The machine I use
to deploy the job can, in fact, start the Flink cluster, but it cannot
submit the job on the random chosen port because our firewall blocks it.

Do you know if this is still the case on 1.7 and if there is any way to
work around this?

Richard

[1]
https://stackoverflow.com/questions/54000276/flink-web-port-can-not-be-configured-correctly-in-yarn-mode

On Thu, Feb 21, 2019 at 3:41 PM Gary Yao <g...@ververica.com> wrote:

> Hi,
>
> Beginning with Flink 1.7, you cannot use the legacy mode anymore [1][2]. I
> am
> currently working on removing references to the legacy mode in the
> documentation [3]. Is there any reason, you cannot use the "new mode"?
>
> Best,
> Gary
>
> [1] https://flink.apache.org/news/2018/11/30/release-1.7.0.html
> [2] https://issues.apache.org/jira/browse/FLINK-10392
> [3] https://issues.apache.org/jira/browse/FLINK-11713
>
> On Mon, Feb 18, 2019 at 12:00 PM Richard Deurwaarder <rich...@xeli.eu>
> wrote:
>
>> Hello,
>>
>> I am trying to upgrade our job from flink 1.4.2 to 1.7.1 but I keep
>> running into timeouts after submitting the job.
>>
>> The flink job runs on our hadoop cluster and starts using Yarn.
>>
>> Relevant config options seem to be:
>>
>> jobmanager.rpc.port: 55501
>>
>> recovery.jobmanager.port: 55502
>>
>> yarn.application-master.port: 55503
>>
>> blob.server.port: 55504
>>
>>
>> I've seen the following behavior:
>>   - Using the same flink-conf.yaml as we used in 1.4.2: 1.5.6 / 1.6.3 /
>> 1.7.1 all versions timeout while 1.4.2 works.
>>   - Using 1.5.6 with "mode: legacy" (to switch off flip-6) works
>>   - Using 1.7.1 with "mode: legacy" gives timeout (I assume this option
>> was removed but the documentation is outdated?
>> https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#legacy
>> )
>>
>> When the timeout happens I get the following stacktrace:
>>
>> INFO class java.time.Instant does not contain a getter for field seconds
>> 2019-02-18T10:16:56.815+01:00
>> INFO class com.bol.fin_hdp.cm1.domain.Cm1Transportable does not contain
>> a getter for field globalId 2019-02-18T10:16:56.815+01:00
>> INFO Submitting job 5af931bcef395a78b5af2b97e92dcffe (detached: false).
>> 2019-02-18T10:16:57.182+01:00
>> INFO ------------------------------------------------------------
>> 2019-02-18T10:29:27.527+01:00
>> INFO The program finished with the following exception:
>> 2019-02-18T10:29:27.564+01:00
>> INFO org.apache.flink.client.program.ProgramInvocationException: The
>> main method caused an error. 2019-02-18T10:29:27.601+01:00
>> INFO at
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545)
>> 2019-02-18T10:29:27.638+01:00
>> INFO at
>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:420)
>> 2019-02-18T10:29:27.675+01:00
>> INFO at
>> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:404)
>> 2019-02-18T10:29:27.711+01:00
>> INFO at
>> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:798)
>> 2019-02-18T10:29:27.747+01:00
>> INFO at
>> org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:289)
>> 2019-02-18T10:29:27.784+01:00
>> INFO at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:215)
>> 2019-02-18T10:29:27.820+01:00
>> INFO at
>> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1035)
>> 2019-02-18T10:29:27.857+01:00
>> INFO at
>> org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1111)
>> 2019-02-18T10:29:27.893+01:00
>> INFO at java.security.AccessController.doPrivileged(Native Method)
>> 2019-02-18T10:29:27.929+01:00
>> INFO at javax.security.auth.Subject.doAs(Subject.java:422)
>> 2019-02-18T10:29:27.968+01:00
>> INFO at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
>> 2019-02-18T10:29:28.004+01:00
>> INFO at
>> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>> 2019-02-18T10:29:28.040+01:00
>> INFO at
>> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1111)
>> 2019-02-18T10:29:28.075+01:00
>> INFO Caused by: java.lang.RuntimeException:
>> org.apache.flink.client.program.ProgramInvocationException: Could not
>> retrieve the execution result. 2019-02-18T10:29:28.110+01:00
>> INFO at
>> com.bol.fin_hdp.job.starter.IntervalJobStarter.startJob(IntervalJobStarter.java:43)
>> 2019-02-18T10:29:28.146+01:00
>> INFO at
>> com.bol.fin_hdp.job.starter.IntervalJobStarter.startJobWithConfig(IntervalJobStarter.java:32)
>> 2019-02-18T10:29:28.182+01:00
>> INFO at com.bol.fin_hdp.Main.main(Main.java:8)
>> 2019-02-18T10:29:28.217+01:00
>> INFO at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> 2019-02-18T10:29:28.253+01:00
>> INFO at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> 2019-02-18T10:29:28.289+01:00
>> INFO at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> 2019-02-18T10:29:28.325+01:00
>> INFO at java.lang.reflect.Method.invoke(Method.java:498)
>> 2019-02-18T10:29:28.363+01:00
>> INFO at
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)
>> 2019-02-18T10:29:28.400+01:00
>> INFO ... 12 more 2019-02-18T10:29:28.436+01:00
>> INFO Caused by:
>> org.apache.flink.client.program.ProgramInvocationException: Could not
>> retrieve the execution result. 2019-02-18T10:29:28.473+01:00
>> INFO at
>> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:258)
>> 2019-02-18T10:29:28.509+01:00
>> INFO at
>> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:464)
>> 2019-02-18T10:29:28.544+01:00
>> INFO at
>> org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66)
>> 2019-02-18T10:29:28.581+01:00
>> INFO at com.bol.fin_hdp.cm1.job.Job.execute(Job.java:54)
>> 2019-02-18T10:29:28.617+01:00
>> INFO at
>> com.bol.fin_hdp.job.starter.IntervalJobStarter.startJob(IntervalJobStarter.java:41)
>> 2019-02-18T10:29:28.654+01:00
>> INFO ... 19 more 2019-02-18T10:29:28.693+01:00
>> INFO Caused by: org.apache.flink.runtime.client.JobSubmissionException:
>> Failed to submit JobGraph. 2019-02-18T10:29:28.730+01:00
>> INFO at
>> org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$8(RestClusterClient.java:371)
>> 2019-02-18T10:29:28.766+01:00
>> INFO at
>> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
>> 2019-02-18T10:29:28.803+01:00
>> INFO at
>> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
>> 2019-02-18T10:29:28.839+01:00
>> INFO at
>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>> 2019-02-18T10:29:28.876+01:00
>> INFO at
>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>> 2019-02-18T10:29:28.912+01:00
>> INFO at
>> org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:216)
>> 2019-02-18T10:29:28.948+01:00
>> INFO at
>> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>> 2019-02-18T10:29:28.986+01:00
>> INFO at
>> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>> 2019-02-18T10:29:29.023+01:00
>> INFO at
>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>> 2019-02-18T10:29:29.060+01:00
>> INFO at
>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>> 2019-02-18T10:29:29.096+01:00
>> INFO at
>> org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:301)
>> 2019-02-18T10:29:29.133+01:00
>> INFO at
>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
>> 2019-02-18T10:29:29.169+01:00
>> INFO at
>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
>> 2019-02-18T10:29:29.206+01:00
>> INFO at
>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
>> 2019-02-18T10:29:29.242+01:00
>> INFO at
>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
>> 2019-02-18T10:29:29.278+01:00
>> INFO at
>> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:214)
>> 2019-02-18T10:29:29.315+01:00
>> INFO at
>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
>> 2019-02-18T10:29:29.352+01:00
>> INFO at
>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)
>> 2019-02-18T10:29:29.388+01:00
>> INFO at
>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
>> 2019-02-18T10:29:29.424+01:00
>> INFO at
>> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>> 2019-02-18T10:29:29.460+01:00
>> INFO at
>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>> 2019-02-18T10:29:29.496+01:00
>> INFO at
>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
>> 2019-02-18T10:29:29.532+01:00
>> INFO at java.lang.Thread.run(Thread.java:748)
>> 2019-02-18T10:29:29.569+01:00
>> INFO Caused by:
>> org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not
>> complete the operation. Number of retries has been exhausted.
>> 2019-02-18T10:29:29.606+01:00
>> INFO at
>> org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)
>> 2019-02-18T10:29:29.643+01:00
>> INFO ... 17 more 2019-02-18T10:29:29.680+01:00
>> INFO Caused by: java.util.concurrent.CompletionException:
>> org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException:
>> connection timed out: shd-hdp-b-slave-01... 2019-02-18T10:29:29.717+01:00
>> INFO at
>> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>> 2019-02-18T10:29:29.753+01:00
>> INFO at
>> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>> 2019-02-18T10:29:29.789+01:00
>> INFO at
>> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
>> 2019-02-18T10:29:29.826+01:00
>> INFO at
>> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
>> 2019-02-18T10:29:29.862+01:00
>> INFO ... 15 more 2019-02-18T10:29:29.898+01:00
>> INFO Caused by:
>> org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException:
>> connection timed out:
>> shd-hdp-b-slave-017.example.com/some.ip.address:46500
>> 2019-02-18T10:29:29.934+01:00
>> INFO at
>> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:212)
>> 2019-02-18T10:29:29.970+01:00
>> INFO ... 7 more
>> Does anyone have tips how to debug this or what configuration changes I
>> need to make?
>>
>

Reply via email to