Hi Son, According to your description, maybe it's caused by the '/tmp' file system retain strategy which removes tmp files regularly.
Son Mai <hongson1...@gmail.com> 于2019年2月27日周三 上午10:27写道: > Hi, > I'm having a question regarding Flink. > I'm running Flink in stand-alone mode on 1 host (JobManager, TaskManager > on the same host). At first, I'm able to submit and cancel jobs normally, > the jobs showed up in the web UI and ran. > However, after ~1month, when I canceled the old job and submitting a new > one, I faced *org.apache.flink.client.program.ProgramInvocationException: > Could not retrieve the execution result.* > At this moment, I was able to run *flink list* to list current jobs and *flink > cancel* to cancel the job, but *flink run* failed. Exception was thrown > and the job was now shown in the web UI. > When I tried to stop the current stand-alone cluster using *stop-cluster*, > it said 'no cluster was found'. Then I had to find the pid of flink > processes and stop them manually. Then if I run *start-cluster* to create > a new stand-alone cluster, I was able to submit jobs normally. > The shortened stack-trace: (full stack-trace at google docs link > <https://docs.google.com/document/d/1v07A4Jp45worykjgMyQTVR-BAoPXwL-O9qGxxhNjXyE/edit?usp=sharing> > ) > org.apache.flink.client.program.ProgramInvocationException: Could not > retrieve the execution result. (JobID: 7ef1cbddb744cd5769297f4059f7c531) > at org.apache.flink.client.program.rest.RestClusterClient.submitJob > (RestClusterClient.java:261) > Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed > to submit JobGraph. > Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: > Could not complete the operation. Number of retries has been exhausted. > Caused by: java.util.concurrent.CompletionException: > org.apache.flink.runtime.rest.ConnectionClosedException: Channel became > inactive. > Caused by: org.apache.flink.runtime.rest.ConnectionClosedException: > Channel became inactive. > ... 37 more > The error is consistent. It always happens after I let Flink run for a > while, usually more than 1 month). Why am I not able to submit job to flink > after a while? What happened here? > Regards, > > Son > -- Benchao Li School of Electronics Engineering and Computer Science, Peking University Tel:+86-15650713730 Email: libenc...@gmail.com; libenc...@pku.edu.cn