Yordan Pavlov created FLINK-31304: ------------------------------------- Summary: Very slow job start if topic has been used before Key: FLINK-31304 URL: https://issues.apache.org/jira/browse/FLINK-31304 Project: Flink Issue Type: Improvement Components: Connectors / Kafka Affects Versions: 1.15.2 Reporter: Yordan Pavlov
We have the following use case. We use KafkaSink with Exactly once semantic, from time to time we would re-start the job clean, in doing so we delete and re-create the output topic and also any Flink checkpoints. In such situation it would take close to an hour for Flink to start. In the the time the job is idling we would see the following log in the Taskmanager: {code:java} 2023-03-02 16:33:42.004 [Source: Kafka source blocks -> Deduplicate blocks -> Map -> Parse blocks -> Map -> Kafka sink volume: Writer -> Kafka sink volume: Committer (2/5)#0] INFO o.apache.kafka.clients.producer.internals.TransactionManager - [Producer clientId=producer-state.clickhouse-0-1-1, transactionalId=state.clickhouse-0-1-1] Invoking InitProducerId for the first time in order to acquire a producer ID 2023-03-02 16:33:42.005 [kafka-producer-network-thread | producer-state.clickhouse-0-2-1] INFO o.apache.kafka.clients.producer.internals.TransactionManager - [Producer clientId=producer-state.clickhouse-0-2-1, transactionalId=state.clickhouse-0-2-1] ProducerId set to 31719488 with epoch 8{code} If we use a brand new output topic name, the job would start straight away. Could you advise if this can be improved? Such logs would go on and on in what seems forever. -- This message was sent by Atlassian Jira (v8.20.10#820010)