I have updated KIP-1042 with a proposal for how to reduce the time spent looking for matching wildcard patterns. Experimentally I see a reduction of 66-75% execution time.
On Mon, May 6, 2024 at 9:03 PM Greg Harris <greg.har...@aiven.io.invalid> wrote: > Hi Murali, > > Thanks for the KIP! > > I think I understand the motivation for this KIP in situations where > there are a "cross product" of topics for two or more variables X and > Y, and want to write ACLs for each of the variable axes. > If you format your topics "X-Y-suffix", it's not easy to write rules > that apply to all "Y" topics, because you need to enumerate all of the > "X" values, and the problem persists even if you reorder the topic > name. > > In my recent work on KIP-986 I found it necessary to introduce > "namespaces" to group topics together, and I was going to replicate > the ACL system to specify those namespaces. This change to the ACL > system could increase the expressiveness and complexity of that > feature, if it is ever implemented. > One of the primitives I needed when specifying namespaces was the > ability to tell when two namespaces overlapped (i.e. does there exist > any topic which is present in both namespaces). This is trivial to do > with the current PREFIX and LITERAL system, as we can find the > maximum-length common prefix with just some length comparisons and > equality checks. > I considered specifying namespaces via regular expressions, and found > that it was computationally much more difficult. Computing the > intersection of two regexes appears to be exponential in the length of > the regexes, leading me to avoid adding it. > > I understand that you're not suggesting full REGEX support, and that > "namespaces" don't need to support MATCH, but I think MATCH may run > into related difficulties. Any MATCH can overlap with any other MATCH > or PREFIX if it includes a sufficient number of wildcards. For > example: > MATCH *-accounts-* has overlap with PREFIX nl as they can both match > "nl-accounts-localtopic", but that isn't sensitive to the contents > "nl", it is true for any PREFIX. > MATCH *-accounts-* has overlap with MATCH *localtopic, as they can > both match "nl-accounts-localtopic", but that isn't actually sensitive > to the contents "localtopic", it's true for any MATCH which includes a > wildcard at the beginning. > > This has implications for execution complexity: If we can't compute > whether two patterns overlap, then we need to run both of them on each > piece of input to test if they both match. Under the current > LITERAL/PREFIX system, we can optimize execution with a trie, but that > option wouldn't be available to us with MATCH. > > The current system makes users evaluate a trade-off: > 1. Optimize the number of ACLs by organizing topics according to > prefixes (for example, "accounts-localtopic-nl" and PREFIX "accounts", > PREFIX "accounts-localtopic") > 2. Use less-structured topic names, with a corresponding ACL scheme > that has more individual rules. > The system currently informs users of this tradeoff by making them > write multiple ACLs, and making them think "there has got to be a > better way!". Perhaps we can find a better way to surface this best > practice, or better inform users about it. > > I understand that there are going to be situations more complex than > your example, where multiple individual rules will always be necessary > with only PREFIX evaluation. I think even in those situations, a > number of efficient-to-evaluate rules is preferable to just one > expensive-to-evaluate rule. > > One alternative that I thought of could be "PARAMETERIZED" ACLs which > are like PREFIXED, but allow some parameter substitution. For example > PARAMETERIZED "(nl|de|cz)-accounts-". I'm lifting regex syntax here, > but this isn't actually a regex, and wouldn't allow arbitrary numbers > of characters, or the * or + operators. > In the background it could evaluate exactly like the 3 individual > PREFIX rules, but be easier to evaluate on the backend, and support > the intersection query I mentioned earlier. It could also support > [a-zA-Z] notation in case the parameter values aren't known ahead of > time, but have a fixed length. > > Thanks, > Greg > > On Mon, May 6, 2024 at 11:17 AM Claude Warren <cla...@xenei.com> wrote: > > > > I have an idea for how to reduce the time for ACL lookups in general and > > particularly where wildcards are involved using sequence > > characterization techniques from bioinformatics. But I need a set of ACL > > patterns and associated topics to test with. > > > > On Fri, May 3, 2024 at 2:45 PM Haruki Okada <ocadar...@gmail.com> wrote: > > > > > Hi, Murali. > > > > > > First, could you add the KIP-1042 to the index ( > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals > > > ) > > > as well so that everyone can find it easily? > > > > > > I took a look at the KIP, then I have 2 questions: > > > > > > 1. Though the new MATCH resource pattern type may reduce the effort of > > > adding ACLs in some cases, do you have any concrete use case you are in > > > mind? (When prefixed ACL was introduced in KIP-290, there was a > use-case > > > that using it for implementing multi-tenancy) > > > > > > 2. As you may know, ACL lookup is in the hot-path which the > performance is > > > very important. ( > > > > > > > https://github.com/apache/kafka/blob/240243b91d69c2b65b5e456065fdcce90c121b04/core/src/main/scala/kafka/security/authorizer/AclAuthorizer.scala#L539 > > > ). > > > Do you have ideas how do we update `matchingAcls` to support > MATCH-type ACL > > > without introducing performance issue? > > > > > > > > > Thanks, > > > > > > 2024年5月3日(金) 19:51 Claude Warren, Jr <claude.war...@aiven.io.invalid>: > > > > > > > As I wrote in [1], the ACL evaluation algorithm needs to be specified > > > with > > > > respect to the specificity of the pattern so that we know exactly > which > > > if > > > > *-accounts-* takes precedence over nl-accounts-* or visa versa. > > > > > > > > I think that we should spell out that precedence is evaluated as > follows: > > > > > > > > 1. Remove patterns that do not match > > > > 2. More specific patterns take precedence over less specific patterns > > > > 3. for patterns of the same precedence DENY overrides ALLOW > > > > > > > > Determining specificity: > > > > > > > > Specificity is based on the Levenshtein distance between the pattern > and > > > > the text being evaluated. The lower the distance the more specific > the > > > > rule. > > > > Using the topic name: nl-accounts-localtopic we can evaluate the > > > > Levenshtein distance for various patterns. > > > > nl-accounts-localtopic = 0 > > > > *-accounts-localtopic = 2 > > > > nl-accounts-local* = 5 > > > > *-accounts-local* = 7 > > > > nl-accounts-* = 10 > > > > *-accounts-* = 12 > > > > > > > > In the special case of matching principles User matches are more > specific > > > > than Group matches. > > > > > > > > I don't know if this should be added to KIP-1042 or presented as a > new > > > KIP. > > > > > > > > Claude > > > > > > > > [1] https://lists.apache.org/thread/0l88tkbxq3ol9rnx0ljnmswj5y6pho1f > > > > < > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls > > > > > > > > > > > > > On Fri, May 3, 2024 at 12:18 PM Claude Warren <cla...@xenei.com> > wrote: > > > > > > > > > Took me awhile to find it but the link to the KIP is > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls > > > > > > > > > > On Fri, May 3, 2024 at 10:13 AM Murali Basani < > murali.bas...@gmail.com > > > > > > > > > wrote: > > > > > > > > > > > Hello, > > > > > > > > > > > > I'd like to propose a suggestion to our resource patterns in > Kafka > > > > ACLs. > > > > > > > > > > > > Currently, when adding new ACLs in Kafka, we have two types of > > > resource > > > > > > patterns for topics: > > > > > > > > > > > > - LITERAL > > > > > > - PREFIXED > > > > > > > > > > > > However, when it comes to listing or removing ACLs, we have a > couple > > > > more > > > > > > options: > > > > > > > > > > > > - MATCH > > > > > > - ANY (will match any pattern type) > > > > > > > > > > > > > > > > > > If we can extend creating acls as well with 'MATCH' pattern > type, it > > > > > would > > > > > > be very beneficial. Even though this kind of acl should be > created > > > with > > > > > > utmost care, it will help organizations streamline their ACL > > > management > > > > > > processes. > > > > > > > > > > > > Example scenarios : > > > > > > > > > > > > Let's say we need to create ACLs for the following six topics: > > > > > > nl-accounts-localtopic, nl-accounts-remotetopic, > > > > de-accounts-localtopic, > > > > > > de-accounts-remotetopic, cz-accounts-localtopic, > > > > cz-accounts-remotetopic > > > > > > > > > > > > Currently, we achieve this using existing functionality by > creating > > > > three > > > > > > prefixed ACLs as shown below: > > > > > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \ > > > > > > > --add \ > > > > > > > --allow-principal > > > > > > > > > > > > > > > > > > > > > > > > > > User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown > > > > > > > \ > > > > > > > --producer \ > > > > > > > --topic nl-accounts- \ > > > > > > > --resource-pattern-type prefixed > > > > > > > > > > > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \ > > > > > > > --add \ > > > > > > > --allow-principal > > > > > > > > > > > > > > > > > > > > > > > > > > User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown > > > > > > > \ > > > > > > > --producer \ > > > > > > > --topic de-accounts- \ > > > > > > > --resource-pattern-type prefixed > > > > > > > > > > > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \ > > > > > > > --add \ > > > > > > > --allow-principal > > > > > > > > > > > > > > > > > > > > > > > > > > User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown > > > > > > > \ > > > > > > > --producer \ > > > > > > > --topic cz-accounts- \ > > > > > > > --resource-pattern-type prefixed > > > > > > > > > > > > > > > > > > However, if we had the 'MATCH' pattern type available, we could > > > > > accomplish > > > > > > this with a single ACL, as illustrated here: > > > > > > > > > > > > kafka-acls --bootstrap-server localhost:9092 \ > > > > > > > --add \ > > > > > > > --allow-principal > > > > > > > > > > > > > > > > > > > > > > > > > > User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown > > > > > > > \ > > > > > > > --producer \ > > > > > > > --topic *-accounts-* \ > > > > > > > --resource-pattern-type match > > > > > > > > > > > > > > > > > > > > > > > > This pattern closely resembles PREFIXED but offers broader > allow/deny > > > > > > rules. > > > > > > > > > > > > Implementing this change could significantly reduce the effort in > > > > several > > > > > > acl management processes. > > > > > > > > > > > > I welcome your thoughts and any concerns you may have regarding > this > > > > > > proposal. > > > > > > > > > > > > Thanks, > > > > > > Murali > > > > > > > > > > > > > > > > > > > > > -- > > > > > LinkedIn: http://www.linkedin.com/in/claudewarren > > > > > > > > > > > > > > > > > > -- > > > ======================== > > > Okada Haruki > > > ocadar...@gmail.com > > > ======================== > > > > > > > > > -- > > LinkedIn: http://www.linkedin.com/in/claudewarren >