On Tue, Sep 3, 2024, at 01:46, Claude Warren, Jr wrote:
> Colin,
> 
> I can see that it makes sense to replace the StandardAuthorizer so that there 
> are not 2 implementations.  However, I think the testing framework should 
> remain so that in future new improved implementations of 
> StandardAuthorizerData can be easily implemented and tested.
> 
> I will put forward a pull request to satisfy KAFKA-17423 that contains only 
> the new implementation.
> 
> Claude

Hi Claude,

Seems like we're hopping around between email threads a bit. I think your 
proposed changes are good and it sounds like we agree that they should become 
part of StandardAuthorizer rather than a separate authorizer. That was my main 
feedback... the rest is probably stuff we can discuss in a PR. (Or maybe 
whoever is reviewing can discuss...)

I am curious why StandardAuthorizerData would be easier to unit-test than 
StandardAuthorizer, but... again, probably better to discuss in a PR. (You 
might be right about this, I just can't visualize it right now without the code 
in front of me)

[More comments below]

On Mon, Sep 2, 2024 at 9:09 AM Claude Warren, Jr <claude.war...@aiven.io> wrote:
>> I have been working on implementing a Trie structure to store ACLs and 
>> improve the performance in the metadata/authorization code.  The upshot of 
>> this was that I found it very difficult to determine if the implementation 
>> was correctly reimplementing the current implementation.
>> 
>> My goal was to simply change the StandardAuthorizerData implementation and 
>> leave the rest of the code as is.  However, there were no abstracted tests 
>> that would allow me to test this.
>> 
>> KAFKA-17316 addresses this issue by creating some internal interfaces for 
>> the "metadata/authorizer" package.  The one change to the StandardAuthorizer 
>> was to implement the"authorizeByResourceType" defined in the 
>> "org.apache.kafka.server.authorizer.Authorizer" interface by passing the 
>> request down to the AuthorizerData implementation.
>> 
>> This change allowed me to create three test implementations.  One that 
>> implemented "authorizeByResourceType" as it is in the released code base, 
>> one that verified that the StandardAuthorizerData implementation did not 
>> change the expected results, and one that showed the Trie implementation in 
>> KAFKA-17423 was also correct.
>> 
>> I think that retaining the work in KAFKA-17316 makes sense as when the next 
>> faster implementation comes along we can drop in the replacement and verify 
>> that it works correctly.
>> 
>> KAFKA-17423 builds on KAFKA-17316 by implementing a Trie based 
>> AuthorizerData implementation.  By splitting the data into a Trie format the 
>> search for matching ACLs is improved by an order of magnitude.  The trie 
>> implementation allows us to quickly locate the candidate ACLs by splitting 
>> them into groups based upon the similarity of the resource name.  In 
>> addition since we are moving through the trie based on resource name we have 
>> several advantages:
>>  1. If we encounter a matching DENY while descending the Trie we can stop as 
>> it overrides anything that may be found at lower levels.  
>>  2. We only look for LITERAL matches on the descent.  If we reach a matching 
>> resource name or a leaf node we know there are no LITERAL matches.
>>  3. If we don't have a DENY or a LITERAL match we walk back up the path 
>> checking the nodes from the descent looking for a PREFIX match.  The 
>> advantage here is that we don't have to search again but can simply retrace 
>> the path using the Trie structure.
>> I believe that #1 and #2 above account for a significant portion of the 
>> speed increase as we do not have to reposition within a sorted list of all 
>> ACLs using a binary search.
>> 
>> Finally, I think that we should prohibit the use of the java.util.stream 
>> classes within the authorizer due to hot path speed considerations.  The 
>> only existing code that uses streams within that package were test cases.  
>> We can prohibit the use by a simple checkstyle prohibition.  Doing so will 
>> short circuit any misguided potential changes.
>> 
>> Thoughs?
>> Claude
>> 

You know, I've been burned by the java.util.stream classes a few times and I am 
inclined to agree with this. It just makes it too easy for people to introduce 
O(N^2) or worse behavior. If you have to write the nested "for" loop yourself, 
the O(N^2)-ness just smacks you in the face. Streams lets you sweep it under 
the rug with filter(), map(), collect(), etc.. Dangerous.

It's fine for tests, though, of course.

best,
Colin

Reply via email to