Re: [DISCUSS] KIP-1042 support for wildcard when creating new acls

Claude Warren, Jr Fri, 02 Aug 2024 02:13:49 -0700

I have created 2 branches in my repository to implement the Trie changes.


   -
   https://github.com/Claudenw/kafka/compare/trunk...StandardAuthorizer_refactor
    refactors the authorizer code to make it easy to implement a different
   Authorizor data block that plugs into the StandardAuthorizor.
   -
   
https://github.com/Claudenw/kafka/compare/StandardAuthorizer_refactor...KIP-1042_trie_simplification
applies
   the Trie changes to the authorizer refactor.





On Fri, Aug 2, 2024 at 10:07 AM Claude Warren, Jr <claude.war...@aiven.io>
wrote:

> Proposed Changes
>> This KIP suggests to support *MATCH* resource pattern type when creating
>> a new kafka acl.
>>
>
> I do not think we need the MATCH support within the Authorizer as noted in
> my earlier message.
>
> *Main changes include :*
>>
>>    - Adding support for MATCH when creating new acl in above and below
>>    classes
>>
>>
> Strike this ^.
>
>>
>>    - Updating Authorizer
>>
>>
>>    - AdminClient changes
>>
>>
>>    - Updating cli
>>
>> *Detailed changes also include *
>>
>>    - Modification of the org.apache.kafka.server.authorizer.Authorizer
>>    class to update authorizeByResourceType method
>>
>>
> Authorizer is an interface, while we can update the default
> implementation the Trie changes have moved the StandardAuthorizer
> implementation down into the StandardAuthorizerData class.  I do not think
> that there are any changes to this functionality required.  What do you
> think needs to be done?
>
>
>>    - Modification of the kafka.security.authorizer.AclAuthorizer class to
>>
>>
>>    - update authorizeByResourceType method and other methods
>>
>>
>>    - update matchingAcls (this is performance sensitive, as it impacts
>>    latency of every producer and consumer client to get authorization. Verify
>>    AuthorizerBenchmark)
>>
>>
>>    - Modification of the kafka.admin.AclCommand class to update multiple
>>    methods like getResourceFilter and objects for parsing arguments
>>    AclCommandOptions
>>
>>
>>    - Modification of the kafka.zk.ZkData class to update multiple
>>    objects like ZkAclStore
>>
>>
> Let's not support ZK with this change.
>
>>
>>    - Modification of kafka.server.AuthHelper class to update authorize
>>    method
>>
>>
>>    - Modification of the org.apache.kafka.jmh.acl.AuthorizerBenchmark
>>    class to update multiple methods like setup and prepareAclCache
>>
>>
>>    - Modification of
>>    org.apache.kafka.jmh.acl.StandardAuthorizerUpdateBenchmark class to update
>>    prepareAcls method
>>
>>
>>    - Modification of
>>    org.apache.kafka.metadata.authorizer.StandardAuthorizerData class to 
>> update
>>    authorize method
>>
>>
>>    - Modification of org.apache.kafka.controller.AclControlManager class
>>    to update validateNewAcl method
>>
>>
>>    - Updating tests
>>
>>
>>    - Similar to prefixed, match ACLs will be stored under the ZK path:
>>    '/kafka-acl-extended/<pattern-type>' and change events to such ACLs will 
>> be
>>    stored under: '/kafka-acl-extended-changes'. where pattern type will be
>>    prefixed or match
>>
>>
>> *Optimization* for existing LITERAL/PREFIX
>>
>>    - Currently LITERAL lookups still go through the tree lookup, while
>>    they can be moved to *hash-lookup*
>>
>>
>>    - For PREFIX : with a Trie-based solution. Refactor matchingAcls
>>    method in AclAuthorizer.scala, to match LITERAL and PREFIXED acls.
>>
>>
>>    - Define the Trie structure
>>
>>
>>    - Populate the Trie with ACLs
>>
>>
>>    - Retrieve ACLs using the Trie
>>
>> With this optimization, we hope to have a drastic reduced latency in the
>> matchingAcls method, and it's much more efficient.
>
> The entire Optimization section can be removed as the Trie based solution
> solves the LITERLAL, PREFIX and wildcard lookup problems.
>
>
> On Fri, Aug 2, 2024 at 9:55 AM Claude Warren, Jr <claude.war...@aiven.io>
> wrote:
>
>> I do not think we need to add the additional MATCH and ALL into the
>> Authorizer.  Currently we have
>>
>>    - Literal - matches exactly
>>    - Prefix - Will match patterns that are longer as long as the part
>>    specified matches exactly.
>>
>> The Trie proposal works well with just these two.  The wildcard is
>> introduced as a pivot point in the trie, so any wildcard is on a node in
>> the trie by itself.  Ignoring the DENY and wildcard processing for a
>> moment, when searching we descend the trie looking for the best matching
>> node (LITERAL or PREFIX) and then check if there are ACLs on that node that
>> match the request.  If not we move to the parent node and check there.  We
>> continue up the tree until we find a match or reach the root (NO MATCH).
>>
>> For wildcard processing we insert an extra step before each move up to
>> the parent we check for a wildcard match.  For the wildcard to match at
>> this point it must be a PREFIX type.  By requiring a PREFIX type we can
>> allow "*" and "?" as valid characters in a LITERAL match.
>>
>> DENY processing is performed during descent of the and matches PREFIX
>> when or LITERAL depending on whether or not there is an exact match.
>>
>> Claude
>>
>> On Mon, Jul 29, 2024 at 8:36 AM Claude Warren, Jr <claude.war...@aiven.io>
>> wrote:
>>
>>> I have updated the KIP with results from the Trie implementation and
>>> they are dramatic to say the least.  For most searches they are at least an
>>> order of magnitude faster and use less memory.  The wildcard search is not
>>> a regular expression but rather a file type wild card (*=1 or more
>>> character, ?=1 character).
>>>
>>> Code is available on my personal branch [1].  It is not fully documented
>>> and does not meet the checkstyle requirements yet.  I also modified the jmh
>>> tests to run the Trie tests and limit the test to the single case mentioned
>>> in the KIP documentation.
>>>
>>> If there are no issues with this code, I will complete the documentation
>>> and fix the checkstyle and then open a pull request.
>>>
>>> Claude
>>>
>>> [1] https://github.com/Claudenw/kafka/pull/new/KIP
>>> -1042_Trie_Implementation
>>>
>>> On Mon, Jun 3, 2024 at 9:31 PM Muralidhar Basani
>>> <muralidhar.bas...@aiven.io.invalid> wrote:
>>>
>>>> Hi,
>>>>
>>>> Just bumping this thread again. It seems no concerns have been raised so
>>>> far.
>>>>
>>>> I'll request votes in 2 weeks.
>>>>
>>>> Thanks,
>>>> Murali
>>>>
>>>> On Tue, May 28, 2024 at 1:24 PM Muralidhar Basani <
>>>> muralidhar.bas...@aiven.io> wrote:
>>>>
>>>> > Hi all,
>>>> >
>>>> > Any other suggestions or objections to the proposal?
>>>> >
>>>> > Thanks,
>>>> > Murali
>>>> >
>>>> > On Thu, May 23, 2024 at 10:18 AM Muralidhar Basani <
>>>> > muralidhar.bas...@aiven.io> wrote:
>>>> >
>>>> >> Thanks Greg.
>>>> >> I have updated KIP with details on optimisation of LITERAL too.
>>>> >>
>>>> >> Regarding limitations in performance by introducing MATCH is
>>>> definitely a
>>>> >> question.
>>>> >> - By optimizing LITERAL and PREFIX we are gaining a few nano secs I
>>>> >> think. (described in kip)
>>>> >> - MATCH can be introduced only with a configuration. So by default,
>>>> we
>>>> >> can disable the MATCH check (ex : acls.pattern.match.enable=false),
>>>> and if
>>>> >> customer enables the config, only then add the lookup in the
>>>> matchingAcls()
>>>> >> - With the proposal already described in KIP for MATCH, we may not
>>>> have
>>>> >> any limitations., rather will be efficient.
>>>> >>
>>>> >> Maybe we are in good shape with these propositions
>>>> >>
>>>> >> Thanks,
>>>> >> Murali
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Tue, May 21, 2024 at 6:15 PM Greg Harris
>>>> <greg.har...@aiven.io.invalid>
>>>> >> wrote:
>>>> >>
>>>> >>> Hi Murali,
>>>> >>>
>>>> >>> I don't have a trie library in mind. I looked at the current
>>>> >>> implementation of the StandardAuthorizer and found that we are
>>>> already
>>>> >>> benefiting from the prefix structure in the implementation [1]. The
>>>> >>> current implementation appears to be a TreePSet [2].
>>>> >>>
>>>> >>> Now, we've already made this tradeoff once with PREFIX: Prefixes are
>>>> >>> less structured than literals, because with literals you can use a
>>>> >>> hashing algorithm to jump directly to your relevant ACLs in O(1),
>>>> but
>>>> >>> with a prefix you either need to do multiple lookups, or some sort
>>>> of
>>>> >>> O(log(n)) lookup. And we determined that the ultimate limitation in
>>>> >>> performance was worth it for the expressiveness.
>>>> >>> We're making this tradeoff again with MATCH acls: wildcards are less
>>>> >>> structured than prefixes or literals, because of the reasons I
>>>> >>> mentioned earlier. We need to judge now if the ultimate limitation
>>>> in
>>>> >>> performance is worth it.
>>>> >>>
>>>> >>> I think your strategy for using the optimized layout for prefix and
>>>> >>> literal matches is smart, because there does seem to be a gap in
>>>> >>> performance possible. It makes me wonder why the optimized layout
>>>> for
>>>> >>> literals was not used when prefixes were added. Literal lookups
>>>> still
>>>> >>> go through the tree lookup, when they could be moved to a
>>>> hash-lookup
>>>> >>> instead.
>>>> >>> That would allow users to "choose" for themselves on a convenience
>>>> vs
>>>> >>> performance scale: Smaller use-cases can add a single convenient
>>>> >>> MATCH, and larger use-cases can add the multiple optimized PREFIXes.
>>>> >>>
>>>> >>> [1]
>>>> >>>
>>>> https://github.com/apache/kafka/blob/9fe3932e5c110443f7fa545fcf0b8f78574f2f73/metadata/src/main/java/org/apache/kafka/metadata/authorizer/StandardAuthorizerData.java#L319-L339
>>>> >>> [2]
>>>> >>>
>>>> https://github.com/apache/kafka/blob/9fe3932e5c110443f7fa545fcf0b8f78574f2f73/server-common/src/main/java/org/apache/kafka/server/immutable/pcollections/PCollectionsImmutableNavigableSet.java#L34
>>>> >>>
>>>> >>> Thanks,
>>>> >>> Greg
>>>> >>>
>>>> >>> On Tue, May 21, 2024 at 8:23 AM Muralidhar Basani
>>>> >>> <muralidhar.bas...@aiven.io.invalid> wrote:
>>>> >>> >
>>>> >>> > @greg which library of trie you were thinking of ? There is one
>>>> in the
>>>> >>> > commons-collection I see.
>>>> >>> >
>>>> >>> > On Fri, May 17, 2024 at 3:55 PM Claude Warren <cla...@xenei.com>
>>>> >>> wrote:
>>>> >>> >
>>>> >>> > > >
>>>> >>> > > > This has implications for execution complexity: If we can't
>>>> compute
>>>> >>> > > > whether two patterns overlap, then we need to run both of
>>>> them on
>>>> >>> each
>>>> >>> > > > piece of input to test if they both match. Under the current
>>>> >>> > > > LITERAL/PREFIX system, we can optimize execution with a trie,
>>>> but
>>>> >>> that
>>>> >>> > > > option wouldn't be available to us with MATCH.
>>>> >>> > > >
>>>> >>> > >
>>>> >>> > > If we consider the case of an asterisk representing 1 or more
>>>> >>> characters
>>>> >>> > > then determining if 2 patterns overlap can be fairly simple and
>>>> very
>>>> >>> fast.
>>>> >>> > > Let's call the text from the ACL the target, and the text from
>>>> the
>>>> >>> wildcard
>>>> >>> > > matcher the candidate.
>>>> >>> > >
>>>> >>> > > When a wildcard pattern, excluding '*',  is created:
>>>> >>> > >
>>>> >>> > >    - the candidate text is broken into fragments separated by
>>>> >>> characters
>>>> >>> > >    that are not Character.isLetterOrDigit() (See
>>>> >>> > >
>>>> >>> https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html
>>>> ).
>>>> >>> > >    - fragments that contain 1 character are ignored.
>>>> >>> > >    - fragments that contains 2 or more characters are scanned
>>>> and
>>>> >>> every
>>>> >>> > >    every pair of characters used to create a Hasher (See
>>>> >>> > >
>>>> >>> > >
>>>> >>>
>>>> https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/bloomfilter/Hasher.html
>>>> >>> > > )
>>>> >>> > >    that hasher is added to a Bloom filter associated with the
>>>> >>> wildcard
>>>> >>> > > pattern.
>>>> >>> > >
>>>> >>> > > When a target is being evaluated and matching wildcard entries
>>>> are
>>>> >>> to be
>>>> >>> > > located. Split and create a bloom filter using entry and the
>>>> same
>>>> >>> strategy
>>>> >>> > > as for the wildcard patterns above.  These bloom filters will
>>>> have
>>>> >>> had more
>>>> >>> > > pairs of characters added than for a matching wildcard pattern.
>>>> >>> > >
>>>> >>> > > Each filter contains the pattern for the unique pairs in the
>>>> >>> fragments of
>>>> >>> > > the original text:
>>>> >>> > >
>>>> >>> > > Now we can check the Bloom filters.
>>>> >>> > >
>>>> >>> > > To find potential matching patterns we can check to see if the
>>>> Bloom
>>>> >>> filter
>>>> >>> > > for the pattern is contained within the bloom filter for the
>>>> entry
>>>> >>> text.
>>>> >>> > > If so then we know that it is likely that the pairs of
>>>> characters
>>>> >>> specified
>>>> >>> > > in the wild card (non-wildcard text) appear in the entry text.
>>>> >>> > >
>>>> >>> > > At this point we can evaluate the reduced number of patterns to
>>>> see
>>>> >>> which
>>>> >>> > > ones actually match.
>>>> >>> > >
>>>> >>> > > Having reduced the candidates to the matching patterns we can
>>>> then
>>>> >>> use a
>>>> >>> > > standard measure of similarity like the Levenshtein distance to
>>>> >>> determine
>>>> >>> > > which candidates require the fewest edits to evolve into the
>>>> >>> target.  The
>>>> >>> > > one with the fewest edits is the most specific and should be
>>>> applied.
>>>> >>> > >
>>>> >>> > > Now if you want to know which patterns overlap you have a
>>>> similar
>>>> >>> process.
>>>> >>> > >
>>>> >>> > > Each Bloom filter can calculate the estimated number of unique
>>>> >>> fragments
>>>> >>> > > that were added to it.  If the filters are sorted by this
>>>> >>> estimation, the
>>>> >>> > > ones with the highest counts may contain the ones with the
>>>> lowest
>>>> >>> counts,
>>>> >>> > > but a lower count can never contain a higher count.  The
>>>> calculation
>>>> >>> to
>>>> >>> > > perform the estimation can be skipped and the cardinality value
>>>> of
>>>> >>> each
>>>> >>> > > Bloom filter can be used instead.  You can then check the
>>>> smaller
>>>> >>> filters
>>>> >>> > > against the larger ones and find where one candidate is
>>>> contained
>>>> >>> within
>>>> >>> > > another.
>>>> >>> > >
>>>> >>> > > If you want to know if they intersect the BloomFilter from
>>>> >>> > > commons-collections has an estimateIntersection as well as an
>>>> >>> estimateUnion
>>>> >>> > > method.  Quick tests can be made between filters to see if
>>>> there is
>>>> >>> any
>>>> >>> > > overlap before more complex analysis is performed.
>>>> >>> > >
>>>> >>> > > The solution here does not make the final comparisons easier, it
>>>> >>> simply
>>>> >>> > > reduces the search space to find the items that need to be
>>>> compared.
>>>> >>> > >
>>>> >>> > >
>>>> >>> > > On Mon, May 6, 2024 at 9:03 PM Greg Harris
>>>> >>> <greg.har...@aiven.io.invalid>
>>>> >>> > > wrote:
>>>> >>> > >
>>>> >>> > > > Hi Murali,
>>>> >>> > > >
>>>> >>> > > > Thanks for the KIP!
>>>> >>> > > >
>>>> >>> > > > I think I understand the motivation for this KIP in situations
>>>> >>> where
>>>> >>> > > > there are a "cross product" of topics for two or more
>>>> variables X
>>>> >>> and
>>>> >>> > > > Y, and want to write ACLs for each of the variable axes.
>>>> >>> > > > If you format your topics "X-Y-suffix", it's not easy to write
>>>> >>> rules
>>>> >>> > > > that apply to all "Y" topics, because you need to enumerate
>>>> all of
>>>> >>> the
>>>> >>> > > > "X" values, and the problem persists even if you reorder the
>>>> topic
>>>> >>> > > > name.
>>>> >>> > > >
>>>> >>> > > > In my recent work on KIP-986 I found it necessary to introduce
>>>> >>> > > > "namespaces" to group topics together, and I was going to
>>>> replicate
>>>> >>> > > > the ACL system to specify those namespaces. This change to
>>>> the ACL
>>>> >>> > > > system could increase the expressiveness and complexity of
>>>> that
>>>> >>> > > > feature, if it is ever implemented.
>>>> >>> > > > One of the primitives I needed when specifying namespaces was
>>>> the
>>>> >>> > > > ability to tell when two namespaces overlapped (i.e. does
>>>> there
>>>> >>> exist
>>>> >>> > > > any topic which is present in both namespaces). This is
>>>> trivial to
>>>> >>> do
>>>> >>> > > > with the current PREFIX and LITERAL system, as we can find the
>>>> >>> > > > maximum-length common prefix with just some length
>>>> comparisons and
>>>> >>> > > > equality checks.
>>>> >>> > > > I considered specifying namespaces via regular expressions,
>>>> and
>>>> >>> found
>>>> >>> > > > that it was computationally much more difficult. Computing the
>>>> >>> > > > intersection of two regexes appears to be exponential in the
>>>> >>> length of
>>>> >>> > > > the regexes, leading me to avoid adding it.
>>>> >>> > > >
>>>> >>> > > > I understand that you're not suggesting full REGEX support,
>>>> and
>>>> >>> that
>>>> >>> > > > "namespaces" don't need to support MATCH, but I think MATCH
>>>> may run
>>>> >>> > > > into related difficulties. Any MATCH can overlap with any
>>>> other
>>>> >>> MATCH
>>>> >>> > > > or PREFIX if it includes a sufficient number of wildcards. For
>>>> >>> > > > example:
>>>> >>> > > > MATCH *-accounts-* has overlap with PREFIX nl as they can both
>>>> >>> match
>>>> >>> > > > "nl-accounts-localtopic", but that isn't sensitive to the
>>>> contents
>>>> >>> > > > "nl", it is true for any PREFIX.
>>>> >>> > > > MATCH *-accounts-* has overlap with MATCH *localtopic, as
>>>> they can
>>>> >>> > > > both match "nl-accounts-localtopic", but that isn't actually
>>>> >>> sensitive
>>>> >>> > > > to the contents "localtopic", it's true for any MATCH which
>>>> >>> includes a
>>>> >>> > > > wildcard at the beginning.
>>>> >>> > > >
>>>> >>> > > > This has implications for execution complexity: If we can't
>>>> compute
>>>> >>> > > > whether two patterns overlap, then we need to run both of
>>>> them on
>>>> >>> each
>>>> >>> > > > piece of input to test if they both match. Under the current
>>>> >>> > > > LITERAL/PREFIX system, we can optimize execution with a trie,
>>>> but
>>>> >>> that
>>>> >>> > > > option wouldn't be available to us with MATCH.
>>>> >>> > > >
>>>> >>> > > > The current system makes users evaluate a trade-off:
>>>> >>> > > > 1. Optimize the number of ACLs by organizing topics according
>>>> to
>>>> >>> > > > prefixes (for example, "accounts-localtopic-nl" and PREFIX
>>>> >>> "accounts",
>>>> >>> > > > PREFIX "accounts-localtopic")
>>>> >>> > > > 2. Use less-structured topic names, with a corresponding ACL
>>>> scheme
>>>> >>> > > > that has more individual rules.
>>>> >>> > > > The system currently informs users of this tradeoff by making
>>>> them
>>>> >>> > > > write multiple ACLs, and making them think "there has got to
>>>> be a
>>>> >>> > > > better way!". Perhaps we can find a better way to surface
>>>> this best
>>>> >>> > > > practice, or better inform users about it.
>>>> >>> > > >
>>>> >>> > > > I understand that there are going to be situations more
>>>> complex
>>>> >>> than
>>>> >>> > > > your example, where multiple individual rules will always be
>>>> >>> necessary
>>>> >>> > > > with only PREFIX evaluation. I think even in those
>>>> situations, a
>>>> >>> > > > number of efficient-to-evaluate rules is preferable to just
>>>> one
>>>> >>> > > > expensive-to-evaluate rule.
>>>> >>> > > >
>>>> >>> > > > One alternative that I thought of could be "PARAMETERIZED"
>>>> ACLs
>>>> >>> which
>>>> >>> > > > are like PREFIXED, but allow some parameter substitution. For
>>>> >>> example
>>>> >>> > > > PARAMETERIZED "(nl|de|cz)-accounts-". I'm lifting regex syntax
>>>> >>> here,
>>>> >>> > > > but this isn't actually a regex, and wouldn't allow arbitrary
>>>> >>> numbers
>>>> >>> > > > of characters, or the * or + operators.
>>>> >>> > > > In the background it could evaluate exactly like the 3
>>>> individual
>>>> >>> > > > PREFIX rules, but be easier to evaluate on the backend, and
>>>> support
>>>> >>> > > > the intersection query I mentioned earlier. It could also
>>>> support
>>>> >>> > > > [a-zA-Z] notation in case the parameter values aren't known
>>>> ahead
>>>> >>> of
>>>> >>> > > > time, but have a fixed length.
>>>> >>> > > >
>>>> >>> > > > Thanks,
>>>> >>> > > > Greg
>>>> >>> > > >
>>>> >>> > > > On Mon, May 6, 2024 at 11:17 AM Claude Warren <
>>>> cla...@xenei.com>
>>>> >>> wrote:
>>>> >>> > > > >
>>>> >>> > > > > I have an idea for how to reduce the time for ACL lookups in
>>>> >>> general
>>>> >>> > > and
>>>> >>> > > > > particularly where wildcards are involved using sequence
>>>> >>> > > > > characterization techniques from bioinformatics.  But I
>>>> need a
>>>> >>> set of
>>>> >>> > > ACL
>>>> >>> > > > > patterns and associated topics to test with.
>>>> >>> > > > >
>>>> >>> > > > > On Fri, May 3, 2024 at 2:45 PM Haruki Okada <
>>>> ocadar...@gmail.com
>>>> >>> >
>>>> >>> > > wrote:
>>>> >>> > > > >
>>>> >>> > > > > > Hi, Murali.
>>>> >>> > > > > >
>>>> >>> > > > > > First, could you add the KIP-1042 to the index (
>>>> >>> > > > > >
>>>> >>> > > > > >
>>>> >>> > > >
>>>> >>> > >
>>>> >>>
>>>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>>>> >>> > > > > > )
>>>> >>> > > > > > as well so that everyone can find it easily?
>>>> >>> > > > > >
>>>> >>> > > > > > I took a look at the KIP, then I have 2 questions:
>>>> >>> > > > > >
>>>> >>> > > > > > 1. Though the new MATCH resource pattern type may reduce
>>>> the
>>>> >>> effort
>>>> >>> > > of
>>>> >>> > > > > > adding ACLs in some cases, do you have any concrete use
>>>> case
>>>> >>> you are
>>>> >>> > > in
>>>> >>> > > > > > mind? (When prefixed ACL was introduced in KIP-290, there
>>>> was a
>>>> >>> > > > use-case
>>>> >>> > > > > > that using it for implementing multi-tenancy)
>>>> >>> > > > > >
>>>> >>> > > > > > 2. As you may know, ACL lookup is in the hot-path which
>>>> the
>>>> >>> > > > performance is
>>>> >>> > > > > > very important. (
>>>> >>> > > > > >
>>>> >>> > > > > >
>>>> >>> > > >
>>>> >>> > >
>>>> >>>
>>>> https://github.com/apache/kafka/blob/240243b91d69c2b65b5e456065fdcce90c121b04/core/src/main/scala/kafka/security/authorizer/AclAuthorizer.scala#L539
>>>> >>> > > > > > ).
>>>> >>> > > > > > Do you have ideas how do we update `matchingAcls` to
>>>> support
>>>> >>> > > > MATCH-type ACL
>>>> >>> > > > > > without introducing performance issue?
>>>> >>> > > > > >
>>>> >>> > > > > >
>>>> >>> > > > > > Thanks,
>>>> >>> > > > > >
>>>> >>> > > > > > 2024年5月3日(金) 19:51 Claude Warren, Jr <
>>>> claude.war...@aiven.io
>>>> >>> > > .invalid>:
>>>> >>> > > > > >
>>>> >>> > > > > > > As I wrote in [1], the ACL evaluation algorithm needs
>>>> to be
>>>> >>> > > specified
>>>> >>> > > > > > with
>>>> >>> > > > > > > respect to the specificity of the pattern so that we
>>>> know
>>>> >>> exactly
>>>> >>> > > > which
>>>> >>> > > > > > if
>>>> >>> > > > > > > *-accounts-* takes precedence over nl-accounts-* or visa
>>>> >>> versa.
>>>> >>> > > > > > >
>>>> >>> > > > > > > I think that we should spell out that precedence is
>>>> >>> evaluated as
>>>> >>> > > > follows:
>>>> >>> > > > > > >
>>>> >>> > > > > > > 1. Remove patterns that do not match
>>>> >>> > > > > > > 2. More specific patterns take precedence over less
>>>> specific
>>>> >>> > > patterns
>>>> >>> > > > > > > 3. for patterns of the same precedence DENY overrides
>>>> ALLOW
>>>> >>> > > > > > >
>>>> >>> > > > > > > Determining specificity:
>>>> >>> > > > > > >
>>>> >>> > > > > > > Specificity is based on the Levenshtein distance
>>>> between the
>>>> >>> > > pattern
>>>> >>> > > > and
>>>> >>> > > > > > > the text being evaluated. The lower the distance the
>>>> more
>>>> >>> specific
>>>> >>> > > > the
>>>> >>> > > > > > > rule.
>>>> >>> > > > > > > Using the topic name: nl-accounts-localtopic we can
>>>> evaluate
>>>> >>> the
>>>> >>> > > > > > > Levenshtein distance for various patterns.
>>>> >>> > > > > > > nl-accounts-localtopic = 0
>>>> >>> > > > > > > *-accounts-localtopic = 2
>>>> >>> > > > > > > nl-accounts-local* = 5
>>>> >>> > > > > > > *-accounts-local* = 7
>>>> >>> > > > > > > nl-accounts-* = 10
>>>> >>> > > > > > > *-accounts-* = 12
>>>> >>> > > > > > >
>>>> >>> > > > > > > In the special case of matching principles User matches
>>>> are
>>>> >>> more
>>>> >>> > > > specific
>>>> >>> > > > > > > than Group matches.
>>>> >>> > > > > > >
>>>> >>> > > > > > > I don't know if this should be added to KIP-1042 or
>>>> >>> presented as a
>>>> >>> > > > new
>>>> >>> > > > > > KIP.
>>>> >>> > > > > > >
>>>> >>> > > > > > > Claude
>>>> >>> > > > > > >
>>>> >>> > > > > > > [1]
>>>> >>> > >
>>>> https://lists.apache.org/thread/0l88tkbxq3ol9rnx0ljnmswj5y6pho1f
>>>> >>> > > > > > > <
>>>> >>> > > > > > >
>>>> >>> > > > > >
>>>> >>> > > >
>>>> >>> > >
>>>> >>>
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls
>>>> >>> > > > > > > >
>>>> >>> > > > > > >
>>>> >>> > > > > > > On Fri, May 3, 2024 at 12:18 PM Claude Warren <
>>>> >>> cla...@xenei.com>
>>>> >>> > > > wrote:
>>>> >>> > > > > > >
>>>> >>> > > > > > > > Took me awhile to find it but the link to the KIP is
>>>> >>> > > > > > > >
>>>> >>> > > > > > > >
>>>> >>> > > > > > >
>>>> >>> > > > > >
>>>> >>> > > >
>>>> >>> > >
>>>> >>>
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls
>>>> >>> > > > > > > >
>>>> >>> > > > > > > > On Fri, May 3, 2024 at 10:13 AM Murali Basani <
>>>> >>> > > > murali.bas...@gmail.com
>>>> >>> > > > > > >
>>>> >>> > > > > > > > wrote:
>>>> >>> > > > > > > >
>>>> >>> > > > > > > > > Hello,
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > > > I'd like to propose a suggestion to our resource
>>>> >>> patterns in
>>>> >>> > > > Kafka
>>>> >>> > > > > > > ACLs.
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > > > Currently, when adding new ACLs in Kafka, we have
>>>> two
>>>> >>> types of
>>>> >>> > > > > > resource
>>>> >>> > > > > > > > > patterns for topics:
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > > >    - LITERAL
>>>> >>> > > > > > > > >    - PREFIXED
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > > > However, when it comes to listing or removing ACLs,
>>>> we
>>>> >>> have a
>>>> >>> > > > couple
>>>> >>> > > > > > > more
>>>> >>> > > > > > > > > options:
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > > >    - MATCH
>>>> >>> > > > > > > > >    - ANY (will match any pattern type)
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > > > If we can extend creating acls as well with 'MATCH'
>>>> >>> pattern
>>>> >>> > > > type, it
>>>> >>> > > > > > > > would
>>>> >>> > > > > > > > > be very beneficial. Even though this kind of acl
>>>> should
>>>> >>> be
>>>> >>> > > > created
>>>> >>> > > > > > with
>>>> >>> > > > > > > > > utmost care, it will help organizations streamline
>>>> their
>>>> >>> ACL
>>>> >>> > > > > > management
>>>> >>> > > > > > > > > processes.
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > > > Example scenarios :
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > > > Let's say we need to create ACLs for the following
>>>> six
>>>> >>> topics:
>>>> >>> > > > > > > > > nl-accounts-localtopic, nl-accounts-remotetopic,
>>>> >>> > > > > > > de-accounts-localtopic,
>>>> >>> > > > > > > > > de-accounts-remotetopic, cz-accounts-localtopic,
>>>> >>> > > > > > > cz-accounts-remotetopic
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > > > Currently, we achieve this using existing
>>>> functionality
>>>> >>> by
>>>> >>> > > > creating
>>>> >>> > > > > > > three
>>>> >>> > > > > > > > > prefixed ACLs as shown below:
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \
>>>> >>> > > > > > > > > > --add \
>>>> >>> > > > > > > > > > --allow-principal
>>>> >>> > > > > > > > > >
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > >
>>>> >>> > > > > > >
>>>> >>> > > > > >
>>>> >>> > > >
>>>> >>> > >
>>>> >>>
>>>> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
>>>> >>> > > > > > > > > > \
>>>> >>> > > > > > > > > > --producer \
>>>> >>> > > > > > > > > > --topic nl-accounts- \
>>>> >>> > > > > > > > > > --resource-pattern-type prefixed
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \
>>>> >>> > > > > > > > > > --add \
>>>> >>> > > > > > > > > > --allow-principal
>>>> >>> > > > > > > > > >
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > >
>>>> >>> > > > > > >
>>>> >>> > > > > >
>>>> >>> > > >
>>>> >>> > >
>>>> >>>
>>>> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
>>>> >>> > > > > > > > > > \
>>>> >>> > > > > > > > > > --producer \
>>>> >>> > > > > > > > > > --topic de-accounts- \
>>>> >>> > > > > > > > > > --resource-pattern-type prefixed
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \
>>>> >>> > > > > > > > > > --add \
>>>> >>> > > > > > > > > > --allow-principal
>>>> >>> > > > > > > > > >
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > >
>>>> >>> > > > > > >
>>>> >>> > > > > >
>>>> >>> > > >
>>>> >>> > >
>>>> >>>
>>>> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
>>>> >>> > > > > > > > > > \
>>>> >>> > > > > > > > > > --producer \
>>>> >>> > > > > > > > > > --topic cz-accounts- \
>>>> >>> > > > > > > > > > --resource-pattern-type prefixed
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > > > However, if we had the 'MATCH' pattern type
>>>> available,
>>>> >>> we could
>>>> >>> > > > > > > > accomplish
>>>> >>> > > > > > > > > this with a single ACL, as illustrated here:
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \
>>>> >>> > > > > > > > > > --add \
>>>> >>> > > > > > > > > > --allow-principal
>>>> >>> > > > > > > > > >
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > >
>>>> >>> > > > > > >
>>>> >>> > > > > >
>>>> >>> > > >
>>>> >>> > >
>>>> >>>
>>>> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
>>>> >>> > > > > > > > > > \
>>>> >>> > > > > > > > > > --producer \
>>>> >>> > > > > > > > > > --topic *-accounts-* \
>>>> >>> > > > > > > > > > --resource-pattern-type match
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > > > This pattern closely resembles PREFIXED but offers
>>>> >>> broader
>>>> >>> > > > allow/deny
>>>> >>> > > > > > > > > rules.
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > > > Implementing this change could significantly reduce
>>>> the
>>>> >>> effort
>>>> >>> > > in
>>>> >>> > > > > > > several
>>>> >>> > > > > > > > > acl management processes.
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > > > I welcome your thoughts and any concerns you may
>>>> have
>>>> >>> regarding
>>>> >>> > > > this
>>>> >>> > > > > > > > > proposal.
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > > > Thanks,
>>>> >>> > > > > > > > > Murali
>>>> >>> > > > > > > > >
>>>> >>> > > > > > > >
>>>> >>> > > > > > > >
>>>> >>> > > > > > > > --
>>>> >>> > > > > > > > LinkedIn: http://www.linkedin.com/in/claudewarren
>>>> >>> > > > > > > >
>>>> >>> > > > > > >
>>>> >>> > > > > >
>>>> >>> > > > > >
>>>> >>> > > > > > --
>>>> >>> > > > > > ========================
>>>> >>> > > > > > Okada Haruki
>>>> >>> > > > > > ocadar...@gmail.com
>>>> >>> > > > > > ========================
>>>> >>> > > > > >
>>>> >>> > > > >
>>>> >>> > > > >
>>>> >>> > > > > --
>>>> >>> > > > > LinkedIn: http://www.linkedin.com/in/claudewarren
>>>> >>> > > >
>>>> >>> > >
>>>> >>> > >
>>>> >>> > > --
>>>> >>> > > LinkedIn: http://www.linkedin.com/in/claudewarren
>>>> >>> > >
>>>> >>>
>>>> >>
>>>>
>>>

Re: [DISCUSS] KIP-1042 support for wildcard when creating new acls

Reply via email to