Hi Michael,

Thank you for reporting this. It is possible that we missed out some
client-side update in KAFKA-8336. Would it be possible for you to provide
logs from multiple brokers around the time of a broker restart when
handshake failures started occurring so that we can try to work out which
client connection is failing? Corresponding to the handshake failure entry
from SocketServer in the restarted broker, there should a failed client
connection on another broker. If that has a stack trace, it would be useful.

Also just to confirm, are these correct:

1) All brokers are dynamically updated with new certs and this works fine
without any errors (this just indicates that config update succeeded)
2) If one broker is restarted, handshakes failures appear in the logs and
these are logged from SocketServer in the restarted broker. (this could
indicate that some client-side cert was not updated dynamically)
3) Once all brokers are restarted, there are no more handshake failures
(this is the actual confirmation that certs from 1) work)

Thank you,

Rajini

On Thu, Aug 1, 2019 at 12:40 AM Michael Carter <
michael.car...@instaclustr.com> wrote:

> Hi all,
>
> I'm having an issue with dynamic configuration of interbroker SSL
> certificates (in Kafka 2.3.0) that I'm hoping someone can give me insight
> on. I've previously posted something similar on the Users email list, but I
> think it will need some help from developers experienced with how the
> networking code in Kafka works.
>
> I'm trying to use SSL two-way authentication for inter broker
> communication, with short lived SSL certificates, rotatating them
> frequently without needing to do a broker restart. So, on each broker in my
> cluster, I periodically generate a new certificate keystore file, and set
> the "listener.name.interbroker.ssl.keystore.location" broker config
> property property. (I'm using inter.broker.listener.name=INTERBROKER)
>
> Setting this property works fine, and everything appears ok. And manually
> connecting to the inter broker listener shows it's correctly serving the
> new certificate. But if I ever restart a broker after the original
> certificate has expired (The one the broker started up with, which is no
> longer configured anywhere), then communication failures between brokers
> start to toccur. My logs fill up with messages like this:
>
> [2019-07-22 03:57:43,605] INFO [SocketServer brokerId=1] Failed
> authentication with 10.224.70.3 (SSL handshake failed)
> (org.apache.kafka.common.network.Selector)
>
> A little bit of extra logging injected into the code tells me that the
> failures are caused by the out of date SSL certificates being used. So it
> seems there are some network components inside Kafka still stuck on the old
> settings.
> This sounds similar to the behaviour described in KAFKA-8336 (
> https://issues.apache.org/jira/browse/KAFKA-8336), but this is marked as
> fixed in 2.3.0.
>
> I've confirmed that all the SslChannelBuilders and SslFactories appear to
> be being reconfigured correctly when the dynamic setting is set. I've tried
> closing all existing KafkaChannels on a reconfiguration event, in order to
> force them to re-open with the new certificates, but the problem persists.
>
> Does anyone have any idea what other components may be hanging around,
> using the old certificates?
>
> Thanks,
> Michael
>
>

Reply via email to