[ https://issues.apache.org/jira/browse/FLINK-24580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Martijn Visser reassigned FLINK-24580: -------------------------------------- Assignee: John Karp > Kinesis connect time out error is not handled as recoverable > ------------------------------------------------------------ > > Key: FLINK-24580 > URL: https://issues.apache.org/jira/browse/FLINK-24580 > Project: Flink > Issue Type: Bug > Components: Connectors / Kinesis > Affects Versions: 1.13.2 > Reporter: John Karp > Assignee: John Karp > Priority: Major > Labels: pull-request-available > > Several times a day, transient Kinesis errors cause our Flink job to fail: > {noformat} > org.apache.flink.kinesis.shaded.com.amazonaws.SdkClientException: Unable to > execute HTTP request: Connect to kinesis.us-east-1.amazonaws.com:443 > [kinesis.us-east-1.amazonaws.com/3.91.171.253] failed: connect timed out > at > org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1207) > at > org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1153) > at > org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802) > at > org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770) > at > org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744) > at > org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704) > at > org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686) > at > org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550) > at > org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530) > at > org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.AmazonKinesisClient.doInvoke(AmazonKinesisClient.java:2893) > at > org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.AmazonKinesisClient.invoke(AmazonKinesisClient.java:2860) > at > org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.AmazonKinesisClient.invoke(AmazonKinesisClient.java:2849) > at > org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.AmazonKinesisClient.executeGetRecords(AmazonKinesisClient.java:1319) > at > org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.AmazonKinesisClient.getRecords(AmazonKinesisClient.java:1288) > at > org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxy.getRecords(KinesisProxy.java:292) > at > org.apache.flink.streaming.connectors.kinesis.internals.publisher.polling.PollingRecordPublisher.getRecords(PollingRecordPublisher.java:168) > at > org.apache.flink.streaming.connectors.kinesis.internals.publisher.polling.PollingRecordPublisher.run(PollingRecordPublisher.java:113) > at > org.apache.flink.streaming.connectors.kinesis.internals.publisher.polling.PollingRecordPublisher.run(PollingRecordPublisher.java:102) > at > org.apache.flink.streaming.connectors.kinesis.internals.ShardConsumer.run(ShardConsumer.java:114) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > Caused by: > org.apache.flink.kinesis.shaded.org.apache.http.conn.ConnectTimeoutException: > Connect to kinesis.us-east-1.amazonaws.com:443 > [kinesis.us-east-1.amazonaws.com/3.91.171.253] failed: connect timed out > at > org.apache.flink.kinesis.shaded.org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:151) > at > org.apache.flink.kinesis.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:374) > at jdk.internal.reflect.GeneratedMethodAccessor168.invoke(Unknown > Source) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.apache.flink.kinesis.shaded.com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76) > at > org.apache.flink.kinesis.shaded.com.amazonaws.http.conn.$Proxy47.connect(Unknown > Source) > at > org.apache.flink.kinesis.shaded.org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393) > at > org.apache.flink.kinesis.shaded.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) > at > org.apache.flink.kinesis.shaded.org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) > at > org.apache.flink.kinesis.shaded.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) > at > org.apache.flink.kinesis.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) > at > org.apache.flink.kinesis.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) > at > org.apache.flink.kinesis.shaded.com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72) > at > org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1331) > at > org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145) > ... 22 more > Caused by: java.net.SocketTimeoutException: connect timed out > at java.base/java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399) > at > java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242) > at > java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224) > at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.base/java.net.Socket.connect(Socket.java:609) > at > org.apache.flink.kinesis.shaded.org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:368) > at > org.apache.flink.kinesis.shaded.com.amazonaws.http.conn.ssl.SdkTLSSocketFactory.connectSocket(SdkTLSSocketFactory.java:142) > at > org.apache.flink.kinesis.shaded.org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142) > ... 37 more > {noformat} > This creates operational noise for us, and having to lose all progress since > the last checkpoint is inefficient for a simple transient issue. > I think this could be solved if > KinesisProxy.isRecoverableSdkClientException(e) recognized connect errors as > being recoverable? -- This message was sent by Atlassian Jira (v8.20.1#820001)