Hi, I have created an improvement jira requesting for a way to disable the PKIAuthenticationPlugin, which was closed for not having been discussed first on the mailing list (SOLR-16551). Apologies for skipping this step, I am posting the same request to the list to open it for feedback from the community.
The PKIAuthenticationPlugin [0] plugin will secure inter-node communication by injecting a custom header that will allow any destination node to verify tampering of messages by checking against the source node's public key. This header also contains a TTL value that exists to prevent replay attacks (default is 5 seconds). Under very high load for increased periods of time, messages can start to expire, causing a spike in authorization errors. By trial and error, increasing the TTL value high enough seems to help the cluster get over the hump, but setting it too high will raise security concerns. This begs the question: is there any circumstance under which it is safe to disable the "header sign and check with TTL" mechanism. It seems that enabling inter-node encryption [1] can provide sufficient protection in transit so that the header approach would no longer be required. To further clarify I am not saying disabling the PKIAuthenticationPlugin will give better security. I am saying enabling it over an encrypted channel will not _add more security_ and it will enforce the 5 seconds TTL for reasons that are not needed anymore (replay attacks). I would like to know what others think. First, is this something that others have seen (heavy load can lead to 401s on inter-node requests). Second, is the approach to disable the PKI plugin sensible or would it cause more confusion and/or security troubles? I am not a security expert so I'm happy to be shown where my reasoning is not correct! thanks alex [0] https://solr.apache.org/guide/solr/latest/deployment-guide/authentication-and-authorization-plugins.html#pkiauthenticationplugin [1] https://solr.apache.org/guide/solr/latest/deployment-guide/enabling-ssl.html