Re: [DISCUSS] KAFKA-17316 and KAFKA-17423

Claude Warren, Jr Tue, 03 Sep 2024 01:46:26 -0700

Colin,

I can see that it makes sense to replace the StandardAuthorizer so that
there are not 2 implementations.  However, I think the testing framework
should remain so that in future new improved implementations of
StandardAuthorizerData can be easily implemented and tested.


I will put forward a pull request to satisfy KAFKA-17423 that contains only
the new implementation.

Claude

On Mon, Sep 2, 2024 at 9:09 AM Claude Warren, Jr <claude.war...@aiven.io>
wrote:

> I have been working on implementing a Trie structure to store ACLs and
> improve the performance in the metadata/authorization code.  The upshot of
> this was that I found it very difficult to determine if the implementation
> was correctly reimplementing the current implementation.
>
> My goal was to simply change the StandardAuthorizerData implementation and
> leave the rest of the code as is.  However, there were no abstracted tests
> that would allow me to test this.
>
> KAFKA-17316 addresses this issue by creating some internal interfaces for
> the "metadata/authorizer" package.  The one change to the
> StandardAuthorizer was to implement the"authorizeByResourceType" defined in
> the "org.apache.kafka.server.authorizer.Authorizer" interface by passing
> the request down to the AuthorizerData implementation.
>
> This change allowed me to create three test implementations.  One that
> implemented "authorizeByResourceType" as it is in the released code base,
> one that verified that the StandardAuthorizerData implementation did not
> change the expected results, and one that showed the Trie implementation in
> KAFKA-17423 was also correct.
>
> I think that retaining the work in KAFKA-17316 makes sense as when the
> next faster implementation comes along we can drop in the replacement and
> verify that it works correctly.
>
> KAFKA-17423 builds on KAFKA-17316 by implementing a Trie based
> AuthorizerData implementation.  By splitting the data into a Trie format
> the search for matching ACLs is improved by an order of magnitude.  The
> trie implementation allows us to quickly locate the candidate ACLs by
> splitting them into groups based upon the similarity of the resource name.
> In addition since we are moving through the trie based on resource name we
> have several advantages:
>
>    1. If we encounter a matching DENY while descending the Trie we can
>    stop as it overrides anything that may be found at lower levels.
>    2. We only look for LITERAL matches on the descent.  If we reach a
>    matching resource name or a leaf node we know there are no LITERAL matches.
>    3. If we don't have a DENY or a LITERAL match we walk back up the path
>    checking the nodes from the descent looking for a PREFIX match.  The
>    advantage here is that we don't have to search again but can simply retrace
>    the path using the Trie structure.
>
> I believe that #1 and #2 above account for a significant portion of the
> speed increase as we do not have to reposition within a sorted list of all
> ACLs using a binary search.
>
> Finally, I think that we should prohibit the use of the java.util.stream
> classes within the authorizer due to hot path speed considerations.  The
> only existing code that uses streams within that package were test cases.
> We can prohibit the use by a simple checkstyle prohibition.  Doing so will
> short circuit any misguided potential changes.
>
> Thoughs?
> Claude
>
>

Re: [DISCUSS] KAFKA-17316 and KAFKA-17423

Reply via email to