[ https://issues.apache.org/jira/browse/FLINK-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zichen Liu updated FLINK-25924: ------------------------------- Description: Intermittent failures introduced as part of merge (PR#18314: [FLINK-24228[connectors/firehose] - Unified Async Sink for Kinesis Firehose|https://github.com/apache/flink/pull/18314]): # Failures are intermittent and affecting c. 1 in 7 of builds- on {{flink-ci.flink}} and {{flink-ci.flink-master-mirror}} . # The issue looks identical to the KinesaliteContainer startup issue (Appendix 1). # I have managed to reproduce the issue locally - if I start some parallel containers and keep them running - and then run {{KinesisFirehoseSinkITCase}} then c. 1 in 6 gives the error. # The errors have a slightly different appearance on {{flink-ci.flink-master-mirror}} vs {{flink-ci.flink}} which has the same appearance as local. I only hope it is a difference in logging/killing environment variables. (and that there aren’t 2 distinct issues) Appendix 1: org.testcontainers.containers.ContainerLaunchException: Container startup failed at org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:336) at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:317) at org.testcontainers.containers.GenericContainer.starting(GenericContainer.java:1066) at ... 11 more Caused by: org.testcontainers.containers.ContainerLaunchException: Could not create/start container at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:525) at org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:331) at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:81) ... 12 more Caused by: org.rnorth.ducttape.TimeoutException: Timeout waiting for result with exception at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:54) at was: intermittent failures introduced as part of our merge yesterday (PR#18314: [[FLINK-24228][connectors/firehose] - Unified Async Sink for Kinesis Firehose|https://github.com/apache/flink/pull/18314]): # Failures are intermittent and affecting c. 1 in 7 of builds- on {{flink-ci.flink}} and {{flink-ci.flink-master-mirror}} . # The issue looks identical to the KinesaliteContainer startup issue (Appendix 1). # I have managed to reproduce the issue locally - if I start some parallel containers and keep them running - and then run {{KinesisFirehoseSinkITCase}} then c. 1 in 6 gives the error. # The errors have a slightly different appearance on {{flink-ci.flink-master-mirror}} vs {{flink-ci.flink}} which has the same appearance as local. I only hope it is a difference in logging/killing environment variables. (and that there aren’t 2 distinct issues) Appendix 1: org.testcontainers.containers.ContainerLaunchException: Container startup failed at org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:336) at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:317) at org.testcontainers.containers.GenericContainer.starting(GenericContainer.java:1066) at ... 11 more Caused by: org.testcontainers.containers.ContainerLaunchException: Could not create/start container at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:525) at org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:331) at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:81) ... 12 more Caused by: org.rnorth.ducttape.TimeoutException: Timeout waiting for result with exception at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:54) at > KDF Integration tests intermittently fails > ------------------------------------------ > > Key: FLINK-25924 > URL: https://issues.apache.org/jira/browse/FLINK-25924 > Project: Flink > Issue Type: New Feature > Components: Connectors / Kinesis > Reporter: Zichen Liu > Assignee: Ahmed Hamdy > Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > Intermittent failures introduced as part of merge (PR#18314: > [FLINK-24228[connectors/firehose] - Unified Async Sink for Kinesis > Firehose|https://github.com/apache/flink/pull/18314]): # Failures are > intermittent and affecting c. 1 in 7 of builds- on {{flink-ci.flink}} and > {{flink-ci.flink-master-mirror}} . > # The issue looks identical to the KinesaliteContainer startup issue > (Appendix 1). > # I have managed to reproduce the issue locally - if I start some parallel > containers and keep them running - and then run {{KinesisFirehoseSinkITCase}} > then c. 1 in 6 gives the error. > # The errors have a slightly different appearance on > {{flink-ci.flink-master-mirror}} vs {{flink-ci.flink}} which has the same > appearance as local. I only hope it is a difference in logging/killing > environment variables. (and that there aren’t 2 distinct issues) > Appendix 1: > org.testcontainers.containers.ContainerLaunchException: Container startup > failed > at > org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:336) > at > org.testcontainers.containers.GenericContainer.start(GenericContainer.java:317) > at > org.testcontainers.containers.GenericContainer.starting(GenericContainer.java:1066) > at > ... 11 more > Caused by: org.testcontainers.containers.ContainerLaunchException: Could not > create/start container > at > org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:525) > at > org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:331) > at > org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:81) > ... 12 more > Caused by: org.rnorth.ducttape.TimeoutException: Timeout waiting for result > with exception > at > org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:54) > at -- This message was sent by Atlassian Jira (v8.20.1#820001)