Hi Dominik, Sounds like it could be this? https://issues.apache.org/jira/browse/FLINK-28060
It doesn't mention transactions but I'd guess it could be the same mechanism. Regards, Alexis. On Mon, 18 Dec 2023, 07:51 Dominik Wosiński, <wos...@gmail.com> wrote: > Hey, > I've got a question regarding the transaction failures in EXACTLY_ONCE > flow with Flink 1.15.3 with Confluent Cloud Kafka. > > The case is that there is a FlinkKafkaProducer in EXACTLY_ONCE setup with > default *transaction.timeout.ms <http://transaction.timeout.ms> *of > 15min. > > During the processing the job had some issues that caused checkpoint to > timeout, that in turn caused the transaction issues, which caused > transaction to fail with the following logs: > Unable to commit transaction > (org.apache.flink.streaming.runtime.operators.sink.committables.CommitRequestImpl@5d0d5082) > because its producer is already fenced. This means that you either have a > different producer with the same 'transactional.id' (this is unlikely > with the 'KafkaSink' as all generated ids are unique and shouldn't be > reused) or recovery took longer than 'transaction.timeout.ms' (900000ms). > In both cases this most likely signals data loss, please consult the Flink > documentation for more details. > Up to this point everything is pretty clear. After that however, the job > continued to work normally but every single transaction was failing with: > Unable to commit transaction > (org.apache.flink.streaming.runtime.operators.sink.committables.CommitRequestImpl@5a924600) > because it's in an invalid state. Most likely the transaction has been > aborted for some reason. Please check the Kafka logs for more details. > Which effectively stalls all downstream processing because no transaction > would be ever commited. > > I've read through the docs and understand that this is kind of a known > issue due to the fact that Kafka doesn't effectively support 2PC, but why > doesn't that cause the failure and restart of the whole job? Currently, the > job will process everything normally and hides the issue until it has grown > catastrophically. > > Thanks in advance, > Cheers. >