Hey All, We are using the low level processor API to create kafka stream applications. Each app has 1 or more in-memory state stores with caching disabled and changelog enabled. Some of the apps also have global stores. We noticed from the node metrics (kubernetes) that the stream applications are consuming too much disk IO. On going deeper I found following
1. Running locally with docker I could see some pretty high disk reads. I used `docker stats` and got `BLOCK I/O` as `438MB / 0B`. To compare we did only a few gigabytes of Net I/O. 2. In kubernetes, `container_fs_reads_bytes_total` gives us pretty big numbers whereas `container_fs_writes_bytes_total` is almost negligible. Now we are *not* using RocksDB. The pattern is not correlated to having a global store. I read various documents but I still can't figure out why a stream application would perform so much disk read. It's not even writing so that rules out the swap space or any buffering etc. I also noticed that a higher amount of data consumption is directly proportional to a higher amount of disk reads. Is it possible that the data is zero copied from the network interface to the disk and Kafka app is reading from it. I am not aware if there is any mechanism to do that. I would really appreciate any help in debugging this issue. Thanks, Mangat