Dan Hill created FLINK-19721:
--------------------------------

             Summary: Speed up the frequency of checks in RpcGatewayRetriever
                 Key: FLINK-19721
                 URL: https://issues.apache.org/jira/browse/FLINK-19721
             Project: Flink
          Issue Type: Improvement
          Components: Test Infrastructure
    Affects Versions: 1.11.2, 1.11.1, 1.12.0
            Reporter: Dan Hill


When writing Flink tests, I could reduce the latency of my 'waitForDone' calls 
by writing my own looping retry-sleep logic than rely on 
`TableResult.getJobClient().get().getJobExecutionResult(...)`.  This is because 
`[MiniCluster|https://github.com/apache/flink/blob/47ca19a74e11c72842124852875262959477c459/flink-runtime/src/main/java/org/apache/flink/runtime/minicluster/MiniCluster.java#L338]`
 uses 
[RpcGatewayRetriever|https://github.com/apache/flink/blob/8674b69964eae50cad024f2c5caf92a71bf21a09/flink-runtime/src/main/java/org/apache/flink/runtime/webmonitor/retriever/impl/RpcGatewayRetriever.java]
 which has a fixed 20ms retry.

 

For a complex test, this can save 50ms-100ms per test run.

 

An easy fix is to change this to an retry with exponential backoff.  This 
reduces the impact 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to