High disk read with Kafka streams

mangat rai Tue, 10 Aug 2021 06:11:01 -0700

Hey All,

We are using the low level processor API to create kafka stream
applications. Each app has 1 or more in-memory state stores with caching
disabled and changelog enabled. Some of the apps also have global stores.
We noticed from the node metrics (kubernetes) that the stream applications
are consuming too much disk IO. On going deeper I found following


1. Running locally with docker I could see some pretty high disk reads. I
used `docker stats` and got `BLOCK I/O` as `438MB / 0B`. To compare we did
only a few gigabytes of Net I/O.
2. In kubernetes, `container_fs_reads_bytes_total` gives us pretty big
numbers whereas `container_fs_writes_bytes_total` is almost negligible.

Now we are *not* using RocksDB. The pattern is not correlated to having a
global store. I read various documents but I still can't figure out why a
stream application would perform so much disk read. It's not even writing
so that rules out the swap space or any buffering etc.

I also noticed that a higher amount of data consumption is directly
proportional to a higher amount of disk reads. Is it possible that the data
is zero copied from the network interface to the disk and Kafka app is
reading from it. I am not aware if there is any mechanism to do that.

I would really appreciate any help in debugging this issue.

Thanks,
Mangat

High disk read with Kafka streams

Reply via email to