Colin,

I would like to leave the framework that is in KAFKA-17316 as it makes
testing the new implementations easier.  But let's discuss that elsewhere
(I will start another discussion for 17316 and 17423 together).

PatternType.GLOB makes sense, I will adjust the KIP to use that
terminology.  I understand from some of my compatriots that there is a
desire to allow GLOBs in resource names, Kafka Principals and Host names.
(Basically all the non-enum based ACL data).

With the Trie implementation for the resource names we can force the trie
to put the '*' and '?' on individual nodes when it encounters them.  This
makes them a natural pivot point when traversing the trie, It is how I
implemented it initially.  The idea is that the more "specific" an ACL is
the better a match it is assumed to be.  So the searching algorithm is
start descending the trie until a matching DENY ACL, a matching LITERAL
ACL, or leaf node is found.  A DENY ACL yields a DENY, a matching LITERAL
ACL yields its specified result.  A non matching leaf node requires further
searching.  We start back up the path starting with the leaf node, looking
for matching PREFIXED nodes.  If a GLOB symbol ('*' or '?') is located as
the child of the current node check for a GLOB match.  Continue until a
PREFIXED ACL match, GLOB ACL Match or the root node is encountered.
PEFIXED and GLOB return their specified value, root return a NOT FOUND
state.

With respect to the Principal and Host names,  I have been recently working
on Apache Rat moving maven based file exclusion to the core and have found
that the Plexus pattern match and selector code is very fast and would
probably not be a significant change in performance from straight check to
GLOB checking.  In these cases we would create what the plexus code calls a
"MatchPattern" and for collections of patterns "MatchPatterns".  We can
then simply check if the requested ACL matches the pattern.  The
MatchPattern will return the original text so we can perform the standard
string match we do today if desired.

I think that the KIP will need updating and that this will require changes
in the Client, Server and Metadata components.  I also believe that we
should prohibit the use of "java.util.stream" classes in the
metadata/authorizer package.  Currently it is only used in a couple of test
cases, but the overhead of streams would kill the authorizer performance so
best to restrict it before someone brings code.

Thoughts?

Claude



On Fri, Aug 30, 2024 at 4:38 PM Colin McCabe <cmcc...@apache.org> wrote:

> Hi Claude,
>
> I think this is great work. Speeding up the Authorizer will be a big win
> for us.
>
> I don't think we need to add additional interfaces for this, though. Just
> get rid of the old slower implementation that I wrote, and replace it with
> your newer, faster one.
>
> Also, I think we should continue the discussion about globs... after all,
> it is the topic of this KIP! (You are calling it "wildcards" but I don't
> think that is good terminiology since it will cause confusion with the
> existing "*" wildcard). Anyway, earlier I requested that you add
> PatternType.GLOB and use it for these things. Does that make sense? I don't
> see another path to doing it compatibly. I certainly wouldn't want to
> create a "Python 2 vs. Python 3" type situation where people get stuck on
> an older authorizer fork because the new one requires globs and they can't
> handle that. No authorizer forks (or at least, not for things like this.)
>
> best,
> Colin
>
>
> On Fri, Aug 30, 2024, at 01:32, Claude Warren, Jr wrote:
> > I made it easier to replace the existing StandardAuthorizerData with a
> > different implementation in order show the Trie implementation met all
> the
> > requirements of the StandardAuthorizerData and could be replaced without
> > changing the StandardAuthorizer implementation.
> >
> > Replacing the current StandardAuthorizerData with a Trie
> > implementation makes sense because it is an order of magnitude faster.
> > This means that when we go to implement the Wildcard types we can perform
> > the search in times that are equivalent to the literal search times in
> the
> > current StandardAuthorizerData implementation.
> >
> > The changes for the first pull request is to create an interface for
> > AuthorizerData and create an "authorizeByResourceType" method within that
> > interface.  This adds an initial implementation in StandardAuthorizerData
> > that mirrors the default implementation found in the Authorizer
> interface.
> >
> > The Trie implementation is not dependent upon other classes local to
> > StandardAuthorizerData other than the MatchingRule and some of its
> > implementations which are now found in the AuthorizerData interface
> instead
> > of the StandardAuthorizerData implementation.  Obviously it is dependent
> > upon AuthorizationResult, AclBinding, KafkaPrincipal and other classes
> that
> > are defined elsewhere.
> >
> > In addition, abstract test cases are implemented to validate that any
> > replacement implementations yield the same results as the original
> > StandardAuthorizerData.
> >
> > I hope that this aswages any concerns that you may have had,
> > Claude
> >
> > On Thu, Aug 29, 2024 at 6:51 PM Colin McCabe <cmcc...@apache.org> wrote:
> >
> >> On Thu, Aug 29, 2024, at 01:34, Claude Warren, Jr wrote:
> >>
> >> Colin,
> >>
> >> Thanks for your insightful comments.  I came to the same conclusion.
> >>
> >> I do have 2 Jira tickets to simplify some of this.
> >>
> >> 1)  KAFKA-17316 <https://issues.apache.org/jira/browse/KAFKA-17316> -
> >> Makes developing a new Authorizer by creating a new implementation of
> the
> >> StandardAuthorizerData easier.  Basically adds interfaces, abstract
> classes
> >> and lots of tests.
> >>
> >> 2) KAFKA-17423 <https://issues.apache.org/jira/browse/KAFKA-17423> -
> >> Builds on KAFKA-17316 by creating a Trie implementation (without the new
> >> wildcards from KIP-1042).  This provides an order of magnitude faster
> >> processing of ACL requests.
> >>
> >> My plan is to get these through the process, so I have a better
> >> understanding of the Kafka development process, and then rework
> KIP-1042 to
> >> create a new PatternType with wildcard processing.
> >>
> >> I would appreciate your feedback on the above tickets if you have the
> time.
> >>
> >>
> >> Hi Claude,
> >>
> >> Thanks for the response. However, I don't want to fork the
> >> StandardAuthorizer. I would be -1 on that since it would greatly enlarge
> >> our maintenance burden, for no good reason. And if we did fork it, I
> >> wouldn't want the forked version to depend on the internal classes of
> the
> >> original one, since that would make evolving either one much more
> difficult.
> >>
> >> We should be able to add wildcards to the existing authorizer without
> too
> >> much trouble, just by adding a new type alongside LITERAL, PREFIXED.
> >> Perhaps GLOB ? We always planned on adding more types, to cover exactly
> >> this scenario.
> >>
> >> best,
> >> Colin
> >>
> >> Claude
> >>
> >> On Fri, Aug 23, 2024 at 9:43 PM Colin McCabe <cmcc...@apache.org>
> wrote:
> >>
> >> On Sat, Jul 27, 2024, at 04:20, Claude Warren, Jr wrote:
> >> > I have updated the KIP with results from the Trie implementation and
> they
> >> > are dramatic to say the least.  For most searches they are at least an
> >> > order of magnitude faster and use less memory.  The wildcard search is
> >> not
> >> > a regular expression but rather a file type wild card (*=1 or more
> >> > character, ?=1 character).
> >> >
> >>
> >> Hi Claude,
> >>
> >> One issue here is that your change is incompatible. For example, if I
> have
> >> a consumer group named '?', and an ALLOW ACL  for '?', this change
> alters
> >> the meaning of that ACL. Previously it just ALLOWed '?' Now it allows
> any
> >> single-character group. This obviously could be a major security issue
> for
> >> people doing upgrades.
> >>
> >> (This wouldn't be an issue if we restricted consumer group names to a
> >> sensible set of characters, like we did with topics. But that ship has
> >> sailed...)
> >>
> >> If you want to add new behavior in a backwards-compatible fashion, your
> >> best bet probably is to create a new PatternType. Unfortunately this
> makes
> >> the complexity issue worse, but I don't see a way around it. Perhaps we
> can
> >> deprecate LITERAL and PREFIXED at some future date, if this wildcard
> thing
> >> makes them unecessary.
> >>
> >> Another issue I see, somewhat related, is that Authorizers are
> pluggable,
> >> and probably won't be updated all at once to support this new type. That
> >> should be fine, but we should document the error that is returned when
> >> someone tries to create your new-style ACLs with an authorizer that does
> >> not support them.
> >>
> >> > Code is available on my personal branch [1].  It is not fully
> documented
> >> > and does not meet the checkstyle requirements yet.  I also modified
> the
> >> jmh
> >> > tests to run the Trie tests and limit the test to the single case
> >> mentioned
> >> > in the KIP documentation.
> >> >
> >> > If there are no issues with this code, I will complete the
> documentation
> >> > and fix the checkstyle and then open a pull request.
> >>
> >> It's fine to open a pull request. You can link it on the KIP page. As
> >> always, we don't want to commit it until the KIP is accepted.
> >>
> >> best,
> >> Colin
> >>
> >> >
> >> > Claude
> >> >
> >> > [1]
> >> https://github.com/Claudenw/kafka/pull/new/KIP-1042_Trie_Implementation
> >> >
> >> >
> >> > On Wed, Jul 3, 2024 at 2:21 PM Claude Warren, Jr <
> claude.war...@aiven.io
> >> >
> >> > wrote:
> >> >
> >> >> I think that if we put in a trie based system we should be able to
> halve
> >> >> the normal searhc times and still be able to locate wild card matches
> >> very
> >> >> quickly.  Users should be warned that "head wildcard" matches are
> slow
> >> and
> >> >> to use them sparingly.  I am going to see if I can work out how to do
> >> >> wildcard matches within the trie.
> >> >>
> >> >> But in all cases can show that the trie is faster than the current
> >> >> implementation.
> >> >>
> >> >> Claude
> >> >>
> >> >>
> >> >>
> >> >> On Wed, Jun 19, 2024 at 7:53 PM Muralidhar Basani
> >> >> <muralidhar.bas...@aiven.io.invalid> wrote:
> >> >>
> >> >>> There are some test results mentioned in the Test Plan section of
> the
> >> Kip,
> >> >>> but we need to do more testing with various patterns and permission
> >> types.
> >> >>> As mentioned in the discuss thread, the trie implementation could
> >> >>> potentially surpass the current speed of ACL match.
> >> >>>
> >> >>> However, we can only accurately assess the results after updating
> the
> >> >>> actual classes and analysing them with AuthorizerBenchmark.
> >> >>>
> >> >>> Thanks,
> >> >>>
> >> >>> Murali
> >> >>>
> >> >>> On Mon, 17 Jun 2024 at 20:39, Colin McCabe <cmcc...@apache.org>
> wrote:
> >> >>>
> >> >>> > My concern is that the extra complexity may actually slow us
> down. In
> >> >>> > general people already complain about the speed of ACL matches,
> and
> >> >>> adding
> >> >>> > another "degree of freedom" seems likely to make things worse.
> >> >>> >
> >> >>> > It would be useful to understand how much faster or slower the
> code
> >> is
> >> >>> > with the propsed changes, versus without them.
> >> >>> >
> >> >>> > best,
> >> >>> > Colin
> >> >>> >
> >> >>> >
> >> >>> > On Mon, Jun 17, 2024, at 01:26, Muralidhar Basani wrote:
> >> >>> > > Hi all,
> >> >>> > >
> >> >>> > > I would like to call a vote on KIP-1042 which extends creation
> of
> >> acls
> >> >>> > with
> >> >>> > > MATCH pattern type.
> >> >>> > >
> >> >>> > > KIP -
> >> >>> > >
> >> >>> >
> >> >>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls
> >> >>> > >
> >> >>> > > Discussion thread -
> >> >>> > >
> https://lists.apache.org/thread/xx3lcg60kp4v34x0j9p6xobby8l4cfq2
> >> >>> > >
> >> >>> > > Thanks,
> >> >>> > > Murali
> >> >>> >
> >> >>>
> >> >>
> >>
> >>
> >>
>

Reply via email to