Colin, I can see that it makes sense to replace the StandardAuthorizer so that there are not 2 implementations. However, I think the testing framework should remain so that in future new improved implementations of StandardAuthorizerData can be easily implemented and tested.
I will put forward a pull request to satisfy KAFKA-17423 that contains only the new implementation. Claude On Mon, Sep 2, 2024 at 9:09 AM Claude Warren, Jr <claude.war...@aiven.io> wrote: > I have been working on implementing a Trie structure to store ACLs and > improve the performance in the metadata/authorization code. The upshot of > this was that I found it very difficult to determine if the implementation > was correctly reimplementing the current implementation. > > My goal was to simply change the StandardAuthorizerData implementation and > leave the rest of the code as is. However, there were no abstracted tests > that would allow me to test this. > > KAFKA-17316 addresses this issue by creating some internal interfaces for > the "metadata/authorizer" package. The one change to the > StandardAuthorizer was to implement the"authorizeByResourceType" defined in > the "org.apache.kafka.server.authorizer.Authorizer" interface by passing > the request down to the AuthorizerData implementation. > > This change allowed me to create three test implementations. One that > implemented "authorizeByResourceType" as it is in the released code base, > one that verified that the StandardAuthorizerData implementation did not > change the expected results, and one that showed the Trie implementation in > KAFKA-17423 was also correct. > > I think that retaining the work in KAFKA-17316 makes sense as when the > next faster implementation comes along we can drop in the replacement and > verify that it works correctly. > > KAFKA-17423 builds on KAFKA-17316 by implementing a Trie based > AuthorizerData implementation. By splitting the data into a Trie format > the search for matching ACLs is improved by an order of magnitude. The > trie implementation allows us to quickly locate the candidate ACLs by > splitting them into groups based upon the similarity of the resource name. > In addition since we are moving through the trie based on resource name we > have several advantages: > > 1. If we encounter a matching DENY while descending the Trie we can > stop as it overrides anything that may be found at lower levels. > 2. We only look for LITERAL matches on the descent. If we reach a > matching resource name or a leaf node we know there are no LITERAL matches. > 3. If we don't have a DENY or a LITERAL match we walk back up the path > checking the nodes from the descent looking for a PREFIX match. The > advantage here is that we don't have to search again but can simply retrace > the path using the Trie structure. > > I believe that #1 and #2 above account for a significant portion of the > speed increase as we do not have to reposition within a sorted list of all > ACLs using a binary search. > > Finally, I think that we should prohibit the use of the java.util.stream > classes within the authorizer due to hot path speed considerations. The > only existing code that uses streams within that package were test cases. > We can prohibit the use by a simple checkstyle prohibition. Doing so will > short circuit any misguided potential changes. > > Thoughs? > Claude > >