Hi We are faced with latency on root disks of our Kafka instances. For example, at one time we had average ios the latency of about 190 ms with 34 ios in the disk queue during few seconds. There were only write ios. Looks like kafka process was hanging. It decreased its CPU utilization to 20 times. After few seconds it was back to normal state and we got the connection timeout for this kafka instance to zookeeper node (each time timeout value in the logs was different). The log.dirs has a separate disk volume. Who can answer why it happened? Any ideas, please. Does log writing can be a cause of this?
Regards