When enabling exactly_once on my kafka sink, I am seeing extremely long initialization times (over 1 hour), especially after restoring from a savepoint. In the logs I see the job constantly initializing thousands of kafka producers like this:
2023-01-31 14:39:58,150 INFO org.apache.kafka.clients.producer.internals. TransactionManager [] - [Producer clientId=producer-common-event-enrichment-mutable-enriched-events-5-1, transactionalId=common-event-enrichment-mutable-enriched-events-5-1] ProducerId set to 847642 with epoch 14 2023-01-31 14:39:58,150 INFO org.apache.kafka.clients.producer.internals. TransactionManager [] - [Producer clientId=producer-common-event-enrichment-mutable-enriched-events-5-1, transactionalId=common-event-enrichment-mutable-enriched-events-5-1] Invoking InitProducerId for the first time in order to acquire a producer ID 2023-01-31 14:39:58,151 INFO org.apache.kafka.clients.producer.internals. TransactionManager [] - [Producer clientId=producer-common-event-enrichment-common-enriched-events-5-1, transactionalId=common-event-enrichment-common-enriched-events-5-1] ProducerId set to 2496758 with epoch 25 2023-01-31 14:39:58,151 INFO org.apache.kafka.clients.producer.internals. TransactionManager [] - [Producer clientId=producer-common-event-enrichment-immutable-enriched-events-5-1, transactionalId=common-event-enrichment-immutable-enriched-events-5-1] ProducerId set to 886210 with epoch 16 2023-01-31 14:39:58,151 INFO org.apache.kafka.clients.producer.internals. TransactionManager [] - [Producer clientId=producer-common-event-enrichment-mutable-enriched-events-5-1, transactionalId=common-event-enrichment-mutable-enriched-events-5-1] Discovered transaction coordinator kafka-broker-2:9092 (id: 2 rack: null) 2023-01-31 14:39:58,151 INFO org.apache.kafka.clients.producer.internals. TransactionManager [] - [Producer clientId=producer-common-event-enrichment-immutable-enriched-events-5-1, transactionalId=common-event-enrichment-immutable-enriched-events-5-1] Invoking InitProducerId for the first time in order to acquire a producer ID 2023-01-31 14:39:58,151 INFO org.apache.kafka.clients.producer.internals. TransactionManager [] - [Producer clientId=producer-common-event-enrichment-common-enriched-events-5-1, transactionalId=common-event-enrichment-common-enriched-events-5-1] Invoking InitProducerId for the first time in order to acquire a producer ID 2023-01-31 14:39:58,152 INFO org.apache.kafka.clients.producer.internals. TransactionManager [] - [Producer clientId=producer-common-event-enrichment-immutable-enriched-events-5-1, transactionalId=common-event-enrichment-immutable-enriched-events-5-1] Discovered transaction coordinator kafka-broker-0:9092 (id: 0 rack: null) 2023-01-31 14:39:58,152 INFO org.apache.kafka.clients.producer.internals. TransactionManager [] - [Producer clientId=producer-common-event-enrichment-common-enriched-events-5-1, transactionalId=common-event-enrichment-common-enriched-events-5-1] Discovered transaction coordinator kafka-broker-2:9092 (id: 2 rack: null) Does transaction timeout impact the startup time? How can I optimize the initialization time? Without exactly_once the job starts up very quickly. -- This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.
smime.p7s
Description: S/MIME Cryptographic Signature