Hi Everyone, I am facing a challenging networking and client management issue with our on-prem Kafka cluster and would appreciate your insights.
Currently, we have various clients (producers and consumers) connecting to our on-prem Kafka cluster from external OpenShift (OCP) and Kubernetes (K8s) environments. The core problem is that our Kafka cluster only sees the K8s worker node's IP address rather than the originating pod's IP. Occasionally, application teams deploy misconfigured pods and abandon them. These rogue pods continuously trigger SASL_SSL handshake errors and authentication failures for hours at a time. This constant spamming heavily strains the Kafka cluster's network. Because we only see the worker node IP—which simultaneously routes traffic for multiple other legitimate clients—I cannot simply block the IP or apply network-level rate limits. Doing so would effectively block the entire worker node and cause outages for other healthy applications. What is the recommended approach or best practice to resolve this issue? Is there a standard way to expose the actual pod IP to the on-prem Kafka cluster, or are there alternative architectural/configuration patterns to effectively isolate and throttle these misconfigured clients without impacting others? Thanks in advance for your time and suggestions. Best regards, M. Cagri Aktas
