Hello,

The issue you're encountering is related to a new heartbeat mechanism
between the client and job in Flink-1.17. If the job does not receive any
heartbeats from the client within a specific timeout, it will cancel itself
to avoid hanging indefinitely.

To address this, you have two options:
1. Run your job in detached mode by adding the -d option in your command
line
2. Increase the client heartbeat timeout setting to a larger value, the
default value is 180 seconds

Best,
Junrui

程意 <chengyi8...@gmail.com> 于2024年3月6日周三 09:53写道:

> In versions 1.17.1 and 1.18.1, I used the yarn per job mode to submit
> tasks, which will end in 4 minutes.  But I tried it on Flink 1.13.1,
> 1.15.2, and 1.16.3, all of which were normal.
> command line at 1.17.1 version:
> ```
> ./bin/flink run -t yarn-per-job -ys 1 -yjm 1G -ytm 3G -yqu default -p 1
> -sae -c org.apache.flink.streaming.examples.socket.SocketWindowWordCount
> ./examples/streaming/SocketWindowWordCount.jar -hostname 192.168.2.111
>  -port 7777
> ```
>
> The logs are printed as follows at 1.17.1 version:
> ```
> 2024-03-05 14:43:08,144 INFO
>  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> TumblingProcessingTimeWindows -> Sink: Print to Std. Out (1/1)
> (a23ddaf520a680f213db5726192b7dc4_90bea66de1c231edf33913ecd54406c1_0_0)
> switched from INITIALIZING to RUNNING. 2024-03-05 14:43:29,232 ERROR
> org.apache.flink.runtime.rest.handler.job.JobClientHeartbeatHandler [] -
> Exception occurred in REST handler: Request did not match expected format
> JobClientHeartbeatRequestBody. 2024-03-05 14:43:59,222 ERROR
> org.apache.flink.runtime.rest.handler.job.JobClientHeartbeatHandler [] -
> Exception occurred in REST handler: Request did not match expected format
> JobClientHeartbeatRequestBody. 2024-03-05 14:44:29,226 ERROR
> org.apache.flink.runtime.rest.handler.job.JobClientHeartbeatHandler [] -
> Exception occurred in REST handler: Request did not match expected format
> JobClientHeartbeatRequestBody. 2024-03-05 14:44:59,218 ERROR
> org.apache.flink.runtime.rest.handler.job.JobClientHeartbeatHandler [] -
> Exception occurred in REST handler: Request did not match expected format
> JobClientHeartbeatRequestBody. 2024-03-05 14:45:29,216 ERROR
> org.apache.flink.runtime.rest.handler.job.JobClientHeartbeatHandler [] -
> Exception occurred in REST handler: Request did not match expected format
> JobClientHeartbeatRequestBody. 2024-03-05 14:45:59,217 ERROR
> org.apache.flink.runtime.rest.handler.job.JobClientHeartbeatHandler [] -
> Exception occurred in REST handler: Request did not match expected format
> JobClientHeartbeatRequestBody. 2024-03-05 14:46:29,217 ERROR
> org.apache.flink.runtime.rest.handler.job.JobClientHeartbeatHandler [] -
> Exception occurred in REST handler: Request did not match expected format
> JobClientHeartbeatRequestBody. 2024-03-05 14:46:58,363 WARN
>  org.apache.flink.runtime.dispatcher.MiniDispatcher           [] - The
> heartbeat from the job client is timeout and cancel the job
> cd6e02e2d60ea07a21e2809000e078cb. You can adjust the heartbeat interval by
> 'client.heartbeat.interval' and the timeout by 'client.heartbeat.timeout'
> ```
>
> I use hadoop version 3.1.1
>
>

Reply via email to