Hey Ken, Regarding Rufus, I know he might be a bit eager in changing lines ;) If you want to ignore his changes in git blame, please take a look here[1].
For the main issue, do you mind creating a ticket? I hope someone will be able to pick it up. Best, Dawid [1] https://nightlies.apache.org/flink/flink-docs-master/docs/flinkdev/ide_setup/#ignoring-refactoring-commits On 01/10/2021 02:10, Ken Krugler wrote: > Hi all, > > We’ve upgraded from Flink 1.11 to 1.13, and our workflows are now > sometimes failing with an exception, even though the job has succeeded. > > The stack trace for this bit of the exception is: > > java.util.concurrent.ExecutionException: > org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could > not complete the operation. Number of retries has been exhausted. > at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) > at > org.apache.flink.client.program.ContextEnvironment.getJobExecutionResult(ContextEnvironment.java:117) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:74) > at my.program.execute.workflow... > > The root cause is "java.net.ConnectException: Connection refused”, > returned from the YARN node where the Job Manager is (was) running. > > ContextEnvironment.java line 117 is: > > jobExecutionResult = jobExecutionResultFuture.get(); > > This looks like a race condition, where YARN is terminating the Job > Manager, and this sometimes completes before the main program has > retrieved all of the job status information. > > I’m wondering if this is a side effect of recent changes to make > execution async/non-blocking. > > Is this a known issue? Anything we can do to work around it? > > Thanks, > > — Ken > > PS - The last two people working on this area code were Aljoscha and > Robert (really wish git blame didn’t show most lines as being modified > by “Rufus Refactor”…sigh) > > -------------------------- > Ken Krugler > http://www.scaleunlimited.com <http://www.scaleunlimited.com> > Custom big data solutions > Flink, Pinot, Solr, Elasticsearch >
OpenPGP_signature
Description: OpenPGP digital signature