Yordan Pavlov created FLINK-31304:
-------------------------------------

             Summary: Very slow job start if topic has been used before
                 Key: FLINK-31304
                 URL: https://issues.apache.org/jira/browse/FLINK-31304
             Project: Flink
          Issue Type: Improvement
          Components: Connectors / Kafka
    Affects Versions: 1.15.2
            Reporter: Yordan Pavlov


We have the following use case. We use KafkaSink with Exactly once semantic, 
from time to time we would re-start the job clean, in doing so we delete and 
re-create the output topic and also any Flink checkpoints. In such situation it 
would take close to an hour for Flink to start. In the the time the job is 
idling we would see the following log in the Taskmanager:


{code:java}
2023-03-02 16:33:42.004 [Source: Kafka source blocks -> Deduplicate blocks -> 
Map -> Parse blocks -> Map -> Kafka sink volume: Writer -> Kafka sink volume: 
Committer (2/5)#0] INFO  
o.apache.kafka.clients.producer.internals.TransactionManager  - [Producer 
clientId=producer-state.clickhouse-0-1-1, 
transactionalId=state.clickhouse-0-1-1] Invoking InitProducerId for the first 
time in order to acquire a producer ID
2023-03-02 16:33:42.005 [kafka-producer-network-thread | 
producer-state.clickhouse-0-2-1] INFO  
o.apache.kafka.clients.producer.internals.TransactionManager  - [Producer 
clientId=producer-state.clickhouse-0-2-1, 
transactionalId=state.clickhouse-0-2-1] ProducerId set to 31719488 with epoch 
8{code}

If we use a brand new output topic name, the job would start straight away. 
Could you advise if this can be improved?

Such logs would go on and on in what seems forever.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to