Hi Imcom Jin,

Thanks for your question!

It is expected behavior that Connect's internal topics are read completely
from the beginning each time the worker starts, regardless of the
auto.offset.reset configuration [1].
This is because they are compacted topics, and the first message in the
topic may be necessary for correctness reasons. For example, if a worker
only reads from the latest offset of the status topic, it may not know the
status of long-running stable tasks.

If you want to reduce the startup time, I suggest reducing the segment
rolling configurations [2,3] for the internal topics. This will permit
Kafka to compact away the duplicate status messages sooner, preventing them
from being read on a future startup. This was previously reported [4] but
we have not yet changed the default.

I hope this helps,
Greg

[1]
https://github.com/apache/kafka/blob/c4fb1008c4856c8cf9594269c86323753e6860ce/connect/runtime/src/main/java/org/apache/kafka/connect/util/KafkaBasedLog.java#L274-L278
[2] https://kafka.apache.org/documentation/#topicconfigs_segment.bytes
[3] https://kafka.apache.org/documentation/#topicconfigs_segment.ms
[4] https://issues.apache.org/jira/browse/KAFKA-15086

On Thu, Aug 14, 2025 at 9:44 AM Imcom JIN <imcom....@nexusguard.com> wrote:

> Hi dear Kafka team,
>
> I see that no matter what properties I give to the connector, the offset
> reset config for internal topics, especially the offset storage topic, say
> my-connect-offsets always use "earliest" which leads to very long bootstrap
> time during restart or stuck workers
>
> Log sample and config sample print in the log
>
> 2025-08-12 10:10:45,531 INFO [Consumer
> clientId=cbdhk04-data-cluster-offsets, groupId=cbdhk04-data-cluster]
> Seeking to earliest offset of partition
>
> root@cbd:/usr/local/nxg/docker/kafka-connect# docker logs
> connect-replication-8085 | grep "auto.offset.reset = earliest" -C2
> auto.commit.interval.ms = 5000
> auto.include.jmx.reporter = true
> auto.offset.reset = earliest
>
> My connect-districuted.properties contains the following config
>
> producer.override.auto.offset.reset=latest
> consumer.override.auto.offset.reset=latest
> producer.auto.offset.reset=latest
> consumer.auto.offset.reset=latest
> auto.offset.reset=latest
> connector.client.config.override.policy=All
>
> None of the above can change the behaviour of the consumer initialized by
> connect to consume internal topics.
>
> What's the expected behaviour? How to improve the bootstrap time for havey
> connect cluster?
> What properties should I use to change the consumer config if possible at
> all.
>
> Thanks in advance
>
> --
> *Imcom Jin*
> Software Engineer Manager, SEG
> T :  +8613552756336
>
> *NEXUSGUARD*
> www.nexusguard.com
> LinkedIn <https://www.linkedin.com/company/nexusguard> • Twitter
> <https://www.twitter.com/nexusguard> • Facebook
> <https://www.facebook.com/nxg.pr>
>
>
>
> Disclaimer: This e-mail message contains information intended solely for
> the intended recipient and is confidential or private in nature. If you are
> not the intended recipient, you must not read, disseminate, distribute,
> copy or otherwise use this message or any file attached to this message.
> Any such unauthorized use is prohibited and may be unlawful. If you have
> received this message in error, please notify the sender immediately by
> email, facsimile or telephone and then delete the original message from
> your machine.
>

Reply via email to