Minimizing global store restoration time

Taylor P Tue, 05 Feb 2019 13:37:31 -0800

Hi,

I am having issues with the global store taking a very long time to restore
during startup of a Kafka Streams 2.0.1 application. The global store is
backed by a RocksDB persistent store and is added to the Streams topology
in the following manner: https://pastebin.com/raw/VJutDyYe The global store
topic has approximately 15 million records per partition and 18 partitions.
The following global consumer settings are specified:


    poll.timeout.ms = 10
    max.poll.records = 2000
    max.partition.fetch.bytes = 1048576
    fetch.max.bytes = 52428800
    receive.buffer.bytes = 65536

I have tried tweaking the settings above on the consumer side, such as
increasing poll.timeout.ms to 2000, max.poll.records to 10000, and
max.partition.fetch.bytes to 52428800, but it seems that I keep hitting a
ceiling of restoring approximately 100,000 records per second. With 15
million records per partition, it takes approximately 150 seconds to
restore a single partition. With 18 partitions, it takes roughly 45 minutes
to fully restore the global store.

Switching from HDDs to SSDs on the brokers' log directories made
restoration roughly 25% faster overall, but this still feels slow. It seems
that I am hitting IOPS limits on the disks and am not even close to hitting
the throughput limits of the disks on either the broker or streams
application side.

How can I minimize restoration time of a global store? Are there settings
that can increase throughput with the same number of IOPS? Ideally
restoration of each partition could be done in parallel but I recognize
there is only a single global store thread. Bringing up a new instance of
the Kafka Streams application occurs on a potentially daily basis, so the
restoration time is becoming more and more of a hassle.

Thanks.

Taylor

Minimizing global store restoration time

Reply via email to