On Tue, Sep 3, 2024, at 01:46, Claude Warren, Jr wrote: > Colin, > > I can see that it makes sense to replace the StandardAuthorizer so that there > are not 2 implementations. However, I think the testing framework should > remain so that in future new improved implementations of > StandardAuthorizerData can be easily implemented and tested. > > I will put forward a pull request to satisfy KAFKA-17423 that contains only > the new implementation. > > Claude
Hi Claude, Seems like we're hopping around between email threads a bit. I think your proposed changes are good and it sounds like we agree that they should become part of StandardAuthorizer rather than a separate authorizer. That was my main feedback... the rest is probably stuff we can discuss in a PR. (Or maybe whoever is reviewing can discuss...) I am curious why StandardAuthorizerData would be easier to unit-test than StandardAuthorizer, but... again, probably better to discuss in a PR. (You might be right about this, I just can't visualize it right now without the code in front of me) [More comments below] On Mon, Sep 2, 2024 at 9:09 AM Claude Warren, Jr <claude.war...@aiven.io> wrote: >> I have been working on implementing a Trie structure to store ACLs and >> improve the performance in the metadata/authorization code. The upshot of >> this was that I found it very difficult to determine if the implementation >> was correctly reimplementing the current implementation. >> >> My goal was to simply change the StandardAuthorizerData implementation and >> leave the rest of the code as is. However, there were no abstracted tests >> that would allow me to test this. >> >> KAFKA-17316 addresses this issue by creating some internal interfaces for >> the "metadata/authorizer" package. The one change to the StandardAuthorizer >> was to implement the"authorizeByResourceType" defined in the >> "org.apache.kafka.server.authorizer.Authorizer" interface by passing the >> request down to the AuthorizerData implementation. >> >> This change allowed me to create three test implementations. One that >> implemented "authorizeByResourceType" as it is in the released code base, >> one that verified that the StandardAuthorizerData implementation did not >> change the expected results, and one that showed the Trie implementation in >> KAFKA-17423 was also correct. >> >> I think that retaining the work in KAFKA-17316 makes sense as when the next >> faster implementation comes along we can drop in the replacement and verify >> that it works correctly. >> >> KAFKA-17423 builds on KAFKA-17316 by implementing a Trie based >> AuthorizerData implementation. By splitting the data into a Trie format the >> search for matching ACLs is improved by an order of magnitude. The trie >> implementation allows us to quickly locate the candidate ACLs by splitting >> them into groups based upon the similarity of the resource name. In >> addition since we are moving through the trie based on resource name we have >> several advantages: >> 1. If we encounter a matching DENY while descending the Trie we can stop as >> it overrides anything that may be found at lower levels. >> 2. We only look for LITERAL matches on the descent. If we reach a matching >> resource name or a leaf node we know there are no LITERAL matches. >> 3. If we don't have a DENY or a LITERAL match we walk back up the path >> checking the nodes from the descent looking for a PREFIX match. The >> advantage here is that we don't have to search again but can simply retrace >> the path using the Trie structure. >> I believe that #1 and #2 above account for a significant portion of the >> speed increase as we do not have to reposition within a sorted list of all >> ACLs using a binary search. >> >> Finally, I think that we should prohibit the use of the java.util.stream >> classes within the authorizer due to hot path speed considerations. The >> only existing code that uses streams within that package were test cases. >> We can prohibit the use by a simple checkstyle prohibition. Doing so will >> short circuit any misguided potential changes. >> >> Thoughs? >> Claude >> You know, I've been burned by the java.util.stream classes a few times and I am inclined to agree with this. It just makes it too easy for people to introduce O(N^2) or worse behavior. If you have to write the nested "for" loop yourself, the O(N^2)-ness just smacks you in the face. Streams lets you sweep it under the rug with filter(), map(), collect(), etc.. Dangerous. It's fine for tests, though, of course. best, Colin