@greg which library of trie you were thinking of ? There is one in the
commons-collection I see.

On Fri, May 17, 2024 at 3:55 PM Claude Warren <cla...@xenei.com> wrote:

> >
> > This has implications for execution complexity: If we can't compute
> > whether two patterns overlap, then we need to run both of them on each
> > piece of input to test if they both match. Under the current
> > LITERAL/PREFIX system, we can optimize execution with a trie, but that
> > option wouldn't be available to us with MATCH.
> >
>
> If we consider the case of an asterisk representing 1 or more characters
> then determining if 2 patterns overlap can be fairly simple and very fast.
> Let's call the text from the ACL the target, and the text from the wildcard
> matcher the candidate.
>
> When a wildcard pattern, excluding '*',  is created:
>
>    - the candidate text is broken into fragments separated by characters
>    that are not Character.isLetterOrDigit() (See
>    https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html).
>    - fragments that contain 1 character are ignored.
>    - fragments that contains 2 or more characters are scanned and every
>    every pair of characters used to create a Hasher (See
>
> https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/bloomfilter/Hasher.html
> )
>    that hasher is added to a Bloom filter associated with the wildcard
> pattern.
>
> When a target is being evaluated and matching wildcard entries are to be
> located. Split and create a bloom filter using entry and the same strategy
> as for the wildcard patterns above.  These bloom filters will have had more
> pairs of characters added than for a matching wildcard pattern.
>
> Each filter contains the pattern for the unique pairs in the fragments of
> the original text:
>
> Now we can check the Bloom filters.
>
> To find potential matching patterns we can check to see if the Bloom filter
> for the pattern is contained within the bloom filter for the entry text.
> If so then we know that it is likely that the pairs of characters specified
> in the wild card (non-wildcard text) appear in the entry text.
>
> At this point we can evaluate the reduced number of patterns to see which
> ones actually match.
>
> Having reduced the candidates to the matching patterns we can then use a
> standard measure of similarity like the Levenshtein distance to determine
> which candidates require the fewest edits to evolve into the target.  The
> one with the fewest edits is the most specific and should be applied.
>
> Now if you want to know which patterns overlap you have a similar process.
>
> Each Bloom filter can calculate the estimated number of unique fragments
> that were added to it.  If the filters are sorted by this estimation, the
> ones with the highest counts may contain the ones with the lowest counts,
> but a lower count can never contain a higher count.  The calculation to
> perform the estimation can be skipped and the cardinality value of each
> Bloom filter can be used instead.  You can then check the smaller filters
> against the larger ones and find where one candidate is contained within
> another.
>
> If you want to know if they intersect the BloomFilter from
> commons-collections has an estimateIntersection as well as an estimateUnion
> method.  Quick tests can be made between filters to see if there is any
> overlap before more complex analysis is performed.
>
> The solution here does not make the final comparisons easier, it simply
> reduces the search space to find the items that need to be compared.
>
>
> On Mon, May 6, 2024 at 9:03 PM Greg Harris <greg.har...@aiven.io.invalid>
> wrote:
>
> > Hi Murali,
> >
> > Thanks for the KIP!
> >
> > I think I understand the motivation for this KIP in situations where
> > there are a "cross product" of topics for two or more variables X and
> > Y, and want to write ACLs for each of the variable axes.
> > If you format your topics "X-Y-suffix", it's not easy to write rules
> > that apply to all "Y" topics, because you need to enumerate all of the
> > "X" values, and the problem persists even if you reorder the topic
> > name.
> >
> > In my recent work on KIP-986 I found it necessary to introduce
> > "namespaces" to group topics together, and I was going to replicate
> > the ACL system to specify those namespaces. This change to the ACL
> > system could increase the expressiveness and complexity of that
> > feature, if it is ever implemented.
> > One of the primitives I needed when specifying namespaces was the
> > ability to tell when two namespaces overlapped (i.e. does there exist
> > any topic which is present in both namespaces). This is trivial to do
> > with the current PREFIX and LITERAL system, as we can find the
> > maximum-length common prefix with just some length comparisons and
> > equality checks.
> > I considered specifying namespaces via regular expressions, and found
> > that it was computationally much more difficult. Computing the
> > intersection of two regexes appears to be exponential in the length of
> > the regexes, leading me to avoid adding it.
> >
> > I understand that you're not suggesting full REGEX support, and that
> > "namespaces" don't need to support MATCH, but I think MATCH may run
> > into related difficulties. Any MATCH can overlap with any other MATCH
> > or PREFIX if it includes a sufficient number of wildcards. For
> > example:
> > MATCH *-accounts-* has overlap with PREFIX nl as they can both match
> > "nl-accounts-localtopic", but that isn't sensitive to the contents
> > "nl", it is true for any PREFIX.
> > MATCH *-accounts-* has overlap with MATCH *localtopic, as they can
> > both match "nl-accounts-localtopic", but that isn't actually sensitive
> > to the contents "localtopic", it's true for any MATCH which includes a
> > wildcard at the beginning.
> >
> > This has implications for execution complexity: If we can't compute
> > whether two patterns overlap, then we need to run both of them on each
> > piece of input to test if they both match. Under the current
> > LITERAL/PREFIX system, we can optimize execution with a trie, but that
> > option wouldn't be available to us with MATCH.
> >
> > The current system makes users evaluate a trade-off:
> > 1. Optimize the number of ACLs by organizing topics according to
> > prefixes (for example, "accounts-localtopic-nl" and PREFIX "accounts",
> > PREFIX "accounts-localtopic")
> > 2. Use less-structured topic names, with a corresponding ACL scheme
> > that has more individual rules.
> > The system currently informs users of this tradeoff by making them
> > write multiple ACLs, and making them think "there has got to be a
> > better way!". Perhaps we can find a better way to surface this best
> > practice, or better inform users about it.
> >
> > I understand that there are going to be situations more complex than
> > your example, where multiple individual rules will always be necessary
> > with only PREFIX evaluation. I think even in those situations, a
> > number of efficient-to-evaluate rules is preferable to just one
> > expensive-to-evaluate rule.
> >
> > One alternative that I thought of could be "PARAMETERIZED" ACLs which
> > are like PREFIXED, but allow some parameter substitution. For example
> > PARAMETERIZED "(nl|de|cz)-accounts-". I'm lifting regex syntax here,
> > but this isn't actually a regex, and wouldn't allow arbitrary numbers
> > of characters, or the * or + operators.
> > In the background it could evaluate exactly like the 3 individual
> > PREFIX rules, but be easier to evaluate on the backend, and support
> > the intersection query I mentioned earlier. It could also support
> > [a-zA-Z] notation in case the parameter values aren't known ahead of
> > time, but have a fixed length.
> >
> > Thanks,
> > Greg
> >
> > On Mon, May 6, 2024 at 11:17 AM Claude Warren <cla...@xenei.com> wrote:
> > >
> > > I have an idea for how to reduce the time for ACL lookups in general
> and
> > > particularly where wildcards are involved using sequence
> > > characterization techniques from bioinformatics.  But I need a set of
> ACL
> > > patterns and associated topics to test with.
> > >
> > > On Fri, May 3, 2024 at 2:45 PM Haruki Okada <ocadar...@gmail.com>
> wrote:
> > >
> > > > Hi, Murali.
> > > >
> > > > First, could you add the KIP-1042 to the index (
> > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> > > > )
> > > > as well so that everyone can find it easily?
> > > >
> > > > I took a look at the KIP, then I have 2 questions:
> > > >
> > > > 1. Though the new MATCH resource pattern type may reduce the effort
> of
> > > > adding ACLs in some cases, do you have any concrete use case you are
> in
> > > > mind? (When prefixed ACL was introduced in KIP-290, there was a
> > use-case
> > > > that using it for implementing multi-tenancy)
> > > >
> > > > 2. As you may know, ACL lookup is in the hot-path which the
> > performance is
> > > > very important. (
> > > >
> > > >
> >
> https://github.com/apache/kafka/blob/240243b91d69c2b65b5e456065fdcce90c121b04/core/src/main/scala/kafka/security/authorizer/AclAuthorizer.scala#L539
> > > > ).
> > > > Do you have ideas how do we update `matchingAcls` to support
> > MATCH-type ACL
> > > > without introducing performance issue?
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > 2024年5月3日(金) 19:51 Claude Warren, Jr <claude.war...@aiven.io
> .invalid>:
> > > >
> > > > > As I wrote in [1], the ACL evaluation algorithm needs to be
> specified
> > > > with
> > > > > respect to the specificity of the pattern so that we know exactly
> > which
> > > > if
> > > > > *-accounts-* takes precedence over nl-accounts-* or visa versa.
> > > > >
> > > > > I think that we should spell out that precedence is evaluated as
> > follows:
> > > > >
> > > > > 1. Remove patterns that do not match
> > > > > 2. More specific patterns take precedence over less specific
> patterns
> > > > > 3. for patterns of the same precedence DENY overrides ALLOW
> > > > >
> > > > > Determining specificity:
> > > > >
> > > > > Specificity is based on the Levenshtein distance between the
> pattern
> > and
> > > > > the text being evaluated. The lower the distance the more specific
> > the
> > > > > rule.
> > > > > Using the topic name: nl-accounts-localtopic we can evaluate the
> > > > > Levenshtein distance for various patterns.
> > > > > nl-accounts-localtopic = 0
> > > > > *-accounts-localtopic = 2
> > > > > nl-accounts-local* = 5
> > > > > *-accounts-local* = 7
> > > > > nl-accounts-* = 10
> > > > > *-accounts-* = 12
> > > > >
> > > > > In the special case of matching principles User matches are more
> > specific
> > > > > than Group matches.
> > > > >
> > > > > I don't know if this should be added to KIP-1042 or presented as a
> > new
> > > > KIP.
> > > > >
> > > > > Claude
> > > > >
> > > > > [1]
> https://lists.apache.org/thread/0l88tkbxq3ol9rnx0ljnmswj5y6pho1f
> > > > > <
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls
> > > > > >
> > > > >
> > > > > On Fri, May 3, 2024 at 12:18 PM Claude Warren <cla...@xenei.com>
> > wrote:
> > > > >
> > > > > > Took me awhile to find it but the link to the KIP is
> > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls
> > > > > >
> > > > > > On Fri, May 3, 2024 at 10:13 AM Murali Basani <
> > murali.bas...@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > I'd like to propose a suggestion to our resource patterns in
> > Kafka
> > > > > ACLs.
> > > > > > >
> > > > > > > Currently, when adding new ACLs in Kafka, we have two types of
> > > > resource
> > > > > > > patterns for topics:
> > > > > > >
> > > > > > >    - LITERAL
> > > > > > >    - PREFIXED
> > > > > > >
> > > > > > > However, when it comes to listing or removing ACLs, we have a
> > couple
> > > > > more
> > > > > > > options:
> > > > > > >
> > > > > > >    - MATCH
> > > > > > >    - ANY (will match any pattern type)
> > > > > > >
> > > > > > >
> > > > > > > If we can extend creating acls as well with 'MATCH' pattern
> > type, it
> > > > > > would
> > > > > > > be very beneficial. Even though this kind of acl should be
> > created
> > > > with
> > > > > > > utmost care, it will help organizations streamline their ACL
> > > > management
> > > > > > > processes.
> > > > > > >
> > > > > > > Example scenarios :
> > > > > > >
> > > > > > > Let's say we need to create ACLs for the following six topics:
> > > > > > > nl-accounts-localtopic, nl-accounts-remotetopic,
> > > > > de-accounts-localtopic,
> > > > > > > de-accounts-remotetopic, cz-accounts-localtopic,
> > > > > cz-accounts-remotetopic
> > > > > > >
> > > > > > > Currently, we achieve this using existing functionality by
> > creating
> > > > > three
> > > > > > > prefixed ACLs as shown below:
> > > > > > >
> > > > > > > kafka-acls --bootstrap-server localhost:9092 \
> > > > > > > > --add \
> > > > > > > > --allow-principal
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > > > > > \
> > > > > > > > --producer \
> > > > > > > > --topic nl-accounts- \
> > > > > > > > --resource-pattern-type prefixed
> > > > > > >
> > > > > > >
> > > > > > > kafka-acls --bootstrap-server localhost:9092 \
> > > > > > > > --add \
> > > > > > > > --allow-principal
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > > > > > \
> > > > > > > > --producer \
> > > > > > > > --topic de-accounts- \
> > > > > > > > --resource-pattern-type prefixed
> > > > > > >
> > > > > > >
> > > > > > > kafka-acls --bootstrap-server localhost:9092 \
> > > > > > > > --add \
> > > > > > > > --allow-principal
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > > > > > \
> > > > > > > > --producer \
> > > > > > > > --topic cz-accounts- \
> > > > > > > > --resource-pattern-type prefixed
> > > > > > >
> > > > > > >
> > > > > > > However, if we had the 'MATCH' pattern type available, we could
> > > > > > accomplish
> > > > > > > this with a single ACL, as illustrated here:
> > > > > > >
> > > > > > > kafka-acls --bootstrap-server localhost:9092 \
> > > > > > > > --add \
> > > > > > > > --allow-principal
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > > > > > \
> > > > > > > > --producer \
> > > > > > > > --topic *-accounts-* \
> > > > > > > > --resource-pattern-type match
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > This pattern closely resembles PREFIXED but offers broader
> > allow/deny
> > > > > > > rules.
> > > > > > >
> > > > > > > Implementing this change could significantly reduce the effort
> in
> > > > > several
> > > > > > > acl management processes.
> > > > > > >
> > > > > > > I welcome your thoughts and any concerns you may have regarding
> > this
> > > > > > > proposal.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Murali
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > LinkedIn: http://www.linkedin.com/in/claudewarren
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > ========================
> > > > Okada Haruki
> > > > ocadar...@gmail.com
> > > > ========================
> > > >
> > >
> > >
> > > --
> > > LinkedIn: http://www.linkedin.com/in/claudewarren
> >
>
>
> --
> LinkedIn: http://www.linkedin.com/in/claudewarren
>

Reply via email to