Re: [DISCUSS] KIP-936: Throttle number of active PIDs
avior in > the KIP is to NOT hash PID with KafkaPrincipal and keep a > Map > then each one of these bloom filters is controlled with > `Shape(, 0.1)`. > > Does that make sense? WDYT? > > Also regarding the eviction function > > (evict function) > > The evict function will determine if it has been a minute since the last > > time we created a new layer, if so create a new layer and evict all > layers > > older than 1 hour. Since the layers are a time ordered list this is > simply > > removing the elderly layers from the front of the list. > > Maybe am missing something here but I can't find anything in the > `LayerManager` code that point to how often will the eviction function > runs. Do you mean that the eviction function runs every minute? If so, can > we control this? > > Cheers, > Omnia > > On Wed, Jun 21, 2023 at 11:43 AM Claude Warren wrote: > > > I think that the either using a Stable bloom filter or a Layered bloom > > filter constructed as follows: > > > > > >- Each layer is configured for the maximum number of principal-PID > pairs > >expected in a single minute. > >- Expect 60 layers (one for each minute) > >- If the layer becomes fully populated add another layer. > >- When the time to insert into the current layer expires, remove the > >layers that are older than an hour. > > > > This will provide a sliding window of one hour with the ability to handle > > bursts above the expected rate of inserts without additional false > > positives. > > > > By hashing the principal and PID into the filter as a single hash only > one > > Bloom filter is required. > > > > The code I pointed to earlier uses the common-collections4 Bloom filter > > implementation. So a rough pseudo code for the implementation is: > > > > Shape shap = Shape.fromNP( 1000, 0.1 ); // 1000 entries, 0.1 false > positive > > rate > > LayeredBloomFilter filter = LayeredBloomFilter( shape, 60, evictFunc ); > // > > 60 initial layers, eviction function. > > > > (on PID) > > > > long[2] buff = Murmur3Hash.hash128x64( String.format("%s%s", principal, > PID > > ).getBytes(java.nio.charset.Charset.UTF8)); > > Hasher hasher = new EnhancedDoubleHasher( buff[0], buff[1] ); > > if (filter.contains(hasher)) { > > // handle case where principal-pid was already seen > > } > > filter.merge( hasher ); // ensures that principal-pid remains in seen for > > the next hour. > > > > (evict function) > > The evict function will determine if it has been a minute since the last > > time we created a new layer, if so create a new layer and evict all > layers > > older than 1 hour. Since the layers are a time ordered list this is > simply > > removing the elderly layers from the front of the list. > > > > if it has not been an hour and the current filter is full (e.g. has 1000 > > entries) create a new layer > > > > This should be very fast and space efficient. > > > > > > On Wed, Jun 21, 2023 at 11:13 AM Claude Warren wrote: > > > > > I have an implementation of a layered Bloom filter in [1] (note the > > > layered branch). This should handle the layering Bloom filter and > allow > > > for layers that > > > > > >1. Do not become over populated and thus yield too many false > > >positives. > > >2. Expire and are removed automatically. > > > > > > The layered Bloom filter also does not need another thread to remove > the > > > old filters as this is amortized across all the inserts. > > > > > > In addition, the number of Layered Bloom filter instances can be > reduced > > > by hashing the Kafka Principal and the ID together into the Bloom > filter > > to > > > look for. > > > > > > [1] https://github.com/Claudenw/BloomFilters/tree/layered > > > > > > On Sun, Jun 18, 2023 at 10:18 PM Omnia Ibrahim < > o.g.h.ibra...@gmail.com> > > > wrote: > > > > > >> Hi Haruki, Thanks for having a look at the KIP. > > >> > 1. Do you have any memory-footprint estimation for > > >> TimeControlledBloomFilter? > > >> I don't at the moment have any estimate as I don't have a full > > >> implementation of this one at the moment. I can work on one if it's > > >> required. > > >> > > >> > * If I read the KIP correctly, TimeControlledBloomFilter will be > > >> > allocated per KafkaPrincipa
Re: [DISCUSS] KIP-936: Throttle number of active PIDs
I misspoke before the LayedBloomFilterTest.testExpiration() uses milliseconds to expire the data but it layout an example of how to expire filters in time intervals. On Fri, Aug 18, 2023 at 4:01 PM Claude Warren wrote: > Sorry for taking so long to get back to you, somehow I missed your message. > > I am not sure how this will work when we have different producer-id-rate >> for different KafkaPrincipal as proposed in the KIP. >> For example `userA` had producer-id-rate of 1000 per hour while `user2` >> has >> a quota of 100 producer ids per hour. How will we configure the max >> entries >> for the Shape? >> > > I am not certain I have a full understanding of your network. However, I > am assuming that you want: > >- To ensure that all produced ids are tracked for 1 hour regardless of >whether they were produced by userA or userB. >- To use a sliding window with 1 minute resolution. > > > There is a tradeoff in the Layered Bloom filter -- larger max entries (N) > or greater depth. > > So the simplest calculation would be 1100 messages per hour / 60 minutes > per hour = 18.3, let's round that to 20. > With an N=20 if more than 20 ids are produced in a minute a second filter > will be created to accept all those over 20. > Let's assume that the first filter was created at time 0:00:00 and the > 21st id comes in at 0:00:45. When the first insert after 1:00:59 occurs > (one hour after start + window time) the first filter will be removed. > When the first insert after 1:01:44 occurs the filter created at 0:00:45 > will be removed. > > So if you have a period of high usage the number of filters (depth of the > layers) increases, as the usage decreases, the numbers go back to expected > depths. You could set the N to a much larger number and each filter would > handle more ids before an extra layer was added. However, if they are > vastly too big then there will be significant wasted space. > > The only thing that comes to my mind to maintain this desired behavior in >> the KIP is to NOT hash PID with KafkaPrincipal and keep a >> Map >> then each one of these bloom filters is controlled with >> `Shape(, 0.1)`. >> > > Do you need to keep a list for each principal? are the PID's supposed to > be globally unique? If the question you are asking is has principal_1 seen > pid_2 then hashing principal_1 and pid_2 together and creating a bloom > filter will tell you using one LayeredBloomFilter. If you also need to > ask: "has anybody seen pid_2?", then there are some other solutions. You > solution will work and may be appropriate in some cases where there is a > wide range of principal message rates. But in that case I would probably > still use the principal+pid solution and just split the filters by > estimated size so that all the ones that need a large filter go into one > system, and the smaller ones go into another. I do note that the hurst > calculator [1] shows that for (1000, 0.1) you need 599 bytes and 3 hash > functions. (100,0.1) you need 60 bytes and 3 hash functions, for (1100, > 0.1) you need 659 bytes and 3 hash functions. I would probably pick 704 > bytes and 3 hash functions which gives you (1176, 0.1). I would pick this > because 704 divides evenly into 64bit long blocks that are used internally > for the SimpleBloomFilter so there is no wasted space. > > Maybe am missing something here but I can't find anything in the >> `LayerManager` code that point to how often will the eviction function >> runs. Do you mean that the eviction function runs every minute? If so, can >> we control this? >> > > The LayerManager.Builder has a setCleanup() method. That method is run > whenever a new layer is added to the filter. This means that you can use > whatever process you want to delete old filters (including none: > LayerManager.Cleanup.noCleanup()). The LayeredBloomFilterTest is an > example of advancing by time (the one minute intervals) and cleaning by > time (1 hr). It also creates a TimestampedBloomFilter to track the time. > If we need an explicit mechanism to remove filters from the LayerManager we > can probably add one. > > I hope this answers your questions. > > I am currently working on getting layered Bloom filters added to commons. > A recent change set the stage for this change so it should be in soon. > > I look forward to hearing from you, > Claude > > [1] https://hur.st/bloomfilter/?n=1000&p=0.1&m=&k= > > On Sun, Jul 16, 2023 at 1:00 PM Omnia Ibrahim > wrote: > >> Thanks Claude for the feedback and the raising this implementation to >> Apache commons-collections. >> I had a look into yo
Re: [DISCUSS] KIP-936: Throttle number of active PIDs
I don't know why I missed this message. You don't have to update the max entries for the shape. Set the max entries to be the highest quota. Then you can use the BloomFilter.estimateN() method to determine how many PIDs have been inserted into the filter. On Wed, Aug 30, 2023 at 1:19 PM Omnia Ibrahim wrote: > Hi Claude, sorry for the late reply was out for some time. Thanks for your > response. > > > - To ensure that all produced ids are tracked for 1 hour regardless of > > whether they were produced by userA or userB. > Not really we need to track producer ids created by userA separately from > producer ids created by userB. > The primary API purpose is to limit the number of producer ids per user as > the quota is set per user; for example, userA might have 100 producerIDs > quota while userB might have only a quota of 50. > And these quotas are dynamic configs so they can change at any time. This > is why am asking how will we update the max entries for the Shape. > > > Do you need to keep a list for each principal? are the PID's supposed to > be globally unique? If the question you are asking is has principal_1 seen > pid_2 then hashing principal_1 and pid_2 together and creating a bloom > filter will tell you using one LayeredBloomFilter. > > The PID should be unique however the problem we are trying to solve here is > to throttle the owner of the vast majority of them which in this case is > the principal. The main concern I have is > how to update the LayedBloomFilter's max which in theory should be the > value of dynamic config `quota` which is set per principal. > If we used one LayedBloomFilter then every time I need to update the max > entries to match the quota I'll need to replace the bloom for all > principals however if they are separated like I suggested then replacing > the LayedBloomFilter of max entries X with another one with max entries Y > will only impact one user and not everyone. Does this make sense? > > > > On Fri, Aug 18, 2023 at 3:03 PM Claude Warren wrote: > > > Sorry for taking so long to get back to you, somehow I missed your > message. > > > > I am not sure how this will work when we have different producer-id-rate > > > for different KafkaPrincipal as proposed in the KIP. > > > For example `userA` had producer-id-rate of 1000 per hour while `user2` > > has > > > a quota of 100 producer ids per hour. How will we configure the max > > entries > > > for the Shape? > > > > > > > I am not certain I have a full understanding of your network. However, I > > am assuming that you want: > > > >- To ensure that all produced ids are tracked for 1 hour regardless of > >whether they were produced by userA or userB. > >- To use a sliding window with 1 minute resolution. > > > > > > There is a tradeoff in the Layered Bloom filter -- larger max entries (N) > > or greater depth. > > > > So the simplest calculation would be 1100 messages per hour / 60 minutes > > per hour = 18.3, let's round that to 20. > > With an N=20 if more than 20 ids are produced in a minute a second filter > > will be created to accept all those over 20. > > Let's assume that the first filter was created at time 0:00:00 and the > > 21st id comes in at 0:00:45. When the first insert after 1:00:59 occurs > > (one hour after start + window time) the first filter will be removed. > > When the first insert after 1:01:44 occurs the filter created at 0:00:45 > > will be removed. > > > > So if you have a period of high usage the number of filters (depth of the > > layers) increases, as the usage decreases, the numbers go back to > expected > > depths. You could set the N to a much larger number and each filter > would > > handle more ids before an extra layer was added. However, if they are > > vastly too big then there will be significant wasted space. > > > > The only thing that comes to my mind to maintain this desired behavior in > > > the KIP is to NOT hash PID with KafkaPrincipal and keep a > > > Map > > > then each one of these bloom filters is controlled with > > > `Shape(, 0.1)`. > > > > > > > Do you need to keep a list for each principal? are the PID's supposed to > > be globally unique? If the question you are asking is has principal_1 > seen > > pid_2 then hashing principal_1 and pid_2 together and creating a bloom > > filter will tell you using one LayeredBloomFilter. If you also need to > > ask: "has anybody seen pid_2?", then there are some other solutions. > You > > solution wi
Re: [DISCUSS] KIP-1042 support for wildcard when creating new acls
Took me awhile to find it but the link to the KIP is https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls On Fri, May 3, 2024 at 10:13 AM Murali Basani wrote: > Hello, > > I'd like to propose a suggestion to our resource patterns in Kafka ACLs. > > Currently, when adding new ACLs in Kafka, we have two types of resource > patterns for topics: > >- LITERAL >- PREFIXED > > However, when it comes to listing or removing ACLs, we have a couple more > options: > >- MATCH >- ANY (will match any pattern type) > > > If we can extend creating acls as well with 'MATCH' pattern type, it would > be very beneficial. Even though this kind of acl should be created with > utmost care, it will help organizations streamline their ACL management > processes. > > Example scenarios : > > Let's say we need to create ACLs for the following six topics: > nl-accounts-localtopic, nl-accounts-remotetopic, de-accounts-localtopic, > de-accounts-remotetopic, cz-accounts-localtopic, cz-accounts-remotetopic > > Currently, we achieve this using existing functionality by creating three > prefixed ACLs as shown below: > > kafka-acls --bootstrap-server localhost:9092 \ > > --add \ > > --allow-principal > > > User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown > > \ > > --producer \ > > --topic nl-accounts- \ > > --resource-pattern-type prefixed > > > kafka-acls --bootstrap-server localhost:9092 \ > > --add \ > > --allow-principal > > > User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown > > \ > > --producer \ > > --topic de-accounts- \ > > --resource-pattern-type prefixed > > > kafka-acls --bootstrap-server localhost:9092 \ > > --add \ > > --allow-principal > > > User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown > > \ > > --producer \ > > --topic cz-accounts- \ > > --resource-pattern-type prefixed > > > However, if we had the 'MATCH' pattern type available, we could accomplish > this with a single ACL, as illustrated here: > > kafka-acls --bootstrap-server localhost:9092 \ > > --add \ > > --allow-principal > > > User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown > > \ > > --producer \ > > --topic *-accounts-* \ > > --resource-pattern-type match > > > > This pattern closely resembles PREFIXED but offers broader allow/deny > rules. > > Implementing this change could significantly reduce the effort in several > acl management processes. > > I welcome your thoughts and any concerns you may have regarding this > proposal. > > Thanks, > Murali > -- LinkedIn: http://www.linkedin.com/in/claudewarren
Re: [DISCUSS] KIP-936 Throttle number of active PIDs
Justine, I am new here so please excuse the ignorance. When you talk about "seen" producers I assume you mean the PIDs that the Bloom filter has seen. When you say "producer produces every 2 hours" are you the producer writes to a topic every 2 hours and uses the same PID? When you say "hitting the limit" what limit is reached? Given the default setup, A producer that produces a PID every 2 hours, regardless of whether or not it is a new PID, will be reported as a new PID being seen. But I would expect the throttling system to accept that as a new PID for the producer and look at the frequency of PIDs and accept without throttling. If the actual question is "how many PIDs did this Principal produce in the last hour" Or "Has this Principal produced more than X PIDs in the last hour", there are probably cleaner ways to do this. If this is the question, I would use CPC from Apache Data Sketches [1] and keep multiple CPC (say every 15 minutes -- to match the KIP-936 proposal) for each Principal. You could then do a quick check on the current CPC to see if it exceeds hour-limit / 4 and if so check the hour rate (by summing the 4 15-minute CPCs). Then the code could simply notify when to throttle and when to stop throttling. Claude https://datasketches.apache.org/docs/CPC/CpcPerformance.html On Fri, May 3, 2024 at 4:21 PM Justine Olshan wrote: > Hey folks, > > I shared this with Omnia offline: > One concern I have is with the length of time we keep "seen" producer IDs. > It seems like the default is 1 hour. If a producer produces every 2 hours > or so, and we are hitting the limit, it seems like we will throttle it even > though we've seen it before and have state for it on the server. Then, it > seems like we will have to wait for the natural expiration of producer ids > (via producer.id.expiration.ms) before we allow new or idle producers to > join again without throttling. I think this proposal is a step in the right > direction when it comes to throttling the "right" clients, but I want to > make sure we have reasonable defaults. Keep in mind that idempotent > producers are the default, so most folks won't be tuning these values out > of the box. > > As for Igor's questions about InitProducerId -- I think the main reason we > have avoided that solution is that there is no state stored for idempotent > producers when grabbing an ID. My concern there is either storing too much > state to track this or throttling before we need to. > > Justine > > On Thu, May 2, 2024 at 2:36 PM Claude Warren, Jr > wrote: > > > There is some question about whether or not we need the configuration > > options. My take on them is as follows: > > > > producer.id.quota.window.num No opinion. I don't know what this is used > > for, but I suspect that there is a good reason to have it. It is not > used > > within the Bloom filter caching mechanism > > producer.id.quota.window.size.seconds Leave it as it is one of the most > > effective ways to tune the filter and determines how long a PID is > > recognized. > > producer.id.quota.cache.cleanup.scheduler.interval.ms Remove it unless > > there is another use for it. We can get a better calculation for > > internals. > > producer.id.quota.cache.layer.count Leave it as it is one of the most > > effective ways to tune the filter. > > producer.id.quota.cache.false.positive.rate Replace it with a constant, > I > > don't think any other Bloom filter solution provides access to this knob > > for end users. > > producer_ids_rate Leave this one, it is critical for reasonable > operation. > > > -- LinkedIn: http://www.linkedin.com/in/claudewarren
Re: [DISCUSS] KIP-1042 support for wildcard when creating new acls
I have an idea for how to reduce the time for ACL lookups in general and particularly where wildcards are involved using sequence characterization techniques from bioinformatics. But I need a set of ACL patterns and associated topics to test with. On Fri, May 3, 2024 at 2:45 PM Haruki Okada wrote: > Hi, Murali. > > First, could you add the KIP-1042 to the index ( > > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals > ) > as well so that everyone can find it easily? > > I took a look at the KIP, then I have 2 questions: > > 1. Though the new MATCH resource pattern type may reduce the effort of > adding ACLs in some cases, do you have any concrete use case you are in > mind? (When prefixed ACL was introduced in KIP-290, there was a use-case > that using it for implementing multi-tenancy) > > 2. As you may know, ACL lookup is in the hot-path which the performance is > very important. ( > > https://github.com/apache/kafka/blob/240243b91d69c2b65b5e456065fdcce90c121b04/core/src/main/scala/kafka/security/authorizer/AclAuthorizer.scala#L539 > ). > Do you have ideas how do we update `matchingAcls` to support MATCH-type ACL > without introducing performance issue? > > > Thanks, > > 2024年5月3日(金) 19:51 Claude Warren, Jr : > > > As I wrote in [1], the ACL evaluation algorithm needs to be specified > with > > respect to the specificity of the pattern so that we know exactly which > if > > *-accounts-* takes precedence over nl-accounts-* or visa versa. > > > > I think that we should spell out that precedence is evaluated as follows: > > > > 1. Remove patterns that do not match > > 2. More specific patterns take precedence over less specific patterns > > 3. for patterns of the same precedence DENY overrides ALLOW > > > > Determining specificity: > > > > Specificity is based on the Levenshtein distance between the pattern and > > the text being evaluated. The lower the distance the more specific the > > rule. > > Using the topic name: nl-accounts-localtopic we can evaluate the > > Levenshtein distance for various patterns. > > nl-accounts-localtopic = 0 > > *-accounts-localtopic = 2 > > nl-accounts-local* = 5 > > *-accounts-local* = 7 > > nl-accounts-* = 10 > > *-accounts-* = 12 > > > > In the special case of matching principles User matches are more specific > > than Group matches. > > > > I don't know if this should be added to KIP-1042 or presented as a new > KIP. > > > > Claude > > > > [1] https://lists.apache.org/thread/0l88tkbxq3ol9rnx0ljnmswj5y6pho1f > > < > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls > > > > > > > On Fri, May 3, 2024 at 12:18 PM Claude Warren wrote: > > > > > Took me awhile to find it but the link to the KIP is > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls > > > > > > On Fri, May 3, 2024 at 10:13 AM Murali Basani > > > > wrote: > > > > > > > Hello, > > > > > > > > I'd like to propose a suggestion to our resource patterns in Kafka > > ACLs. > > > > > > > > Currently, when adding new ACLs in Kafka, we have two types of > resource > > > > patterns for topics: > > > > > > > >- LITERAL > > > >- PREFIXED > > > > > > > > However, when it comes to listing or removing ACLs, we have a couple > > more > > > > options: > > > > > > > >- MATCH > > > >- ANY (will match any pattern type) > > > > > > > > > > > > If we can extend creating acls as well with 'MATCH' pattern type, it > > > would > > > > be very beneficial. Even though this kind of acl should be created > with > > > > utmost care, it will help organizations streamline their ACL > management > > > > processes. > > > > > > > > Example scenarios : > > > > > > > > Let's say we need to create ACLs for the following six topics: > > > > nl-accounts-localtopic, nl-accounts-remotetopic, > > de-accounts-localtopic, > > > > de-accounts-remotetopic, cz-accounts-localtopic, > > cz-accounts-remotetopic > > > > > > > > Currently, we achieve this using existing functionality by creating > > three > > > > prefixed ACLs as shown below: > > > >
[DISCUSS] KIP-1044: A proposal to change idempotent producer -- server implementation
This is a proposal that should solve the OOM problem on the servers without some of the other proposed KIPs being active. Full details in https://cwiki.apache.org/confluence/display/KAFKA/KIP-1044%3A+A+proposal+to+change+idempotent+producer+--+server+implementation
Re: [DISCUSS] KIP-1042 support for wildcard when creating new acls
I think that a Trie solution for LITERAL and PREFIX would be faster than attempting to use the wildcard search strategies on them. The reasons being - that the wildcard search strategy may return some non-matching (false positive) patterns that have to be checked and ignored. - the size of the Bloom filters used for searching may be artificially inflated because of the length of the LITERAL matches (PREFIX matches too but LITERAL will probably be longer anyway) - The WILDCARD type will require a Bloom filter and it should be created once and accept the 80-100 byte of overhead rather than the calculation time on every use. On Wed, May 15, 2024 at 5:00 PM Murali Basani wrote: > Thank you Haruki for the feedback. > There are certainly many such use cases where customers ended up creating > custom authorizers to handle these MATCH based patterns. > And regarding updating method matchingAcls, Claude updated the KIP > mentioning an approach, which definitely optimizes the flow, but can be > discussed further during the PR implementation I believe. > > Thank you Greg for the feedback. > I understand with trie, existing mechanisms for PREFIX/LITERAL can be > optimized. How about even trying this matchingAcls with a new > implementation which Claude proposed, may be we can improve the existing > flows. Probably we can see those details during PR development. > > If we don't achieve the expected results as per AuthorizerBenchmark, we can > drop this kip. > > And thank you Claude for the suggestion on the new implementation. > > On Tue, May 7, 2024 at 4:37 PM Claude Warren, Jr > wrote: > > > I have updated KIP-1042 with a proposal for how to reduce the time spent > > looking for matching wildcard patterns. Experimentally I see a reduction > > of 66-75% execution time. > > > > On Mon, May 6, 2024 at 9:03 PM Greg Harris > > > wrote: > > > > > Hi Murali, > > > > > > Thanks for the KIP! > > > > > > I think I understand the motivation for this KIP in situations where > > > there are a "cross product" of topics for two or more variables X and > > > Y, and want to write ACLs for each of the variable axes. > > > If you format your topics "X-Y-suffix", it's not easy to write rules > > > that apply to all "Y" topics, because you need to enumerate all of the > > > "X" values, and the problem persists even if you reorder the topic > > > name. > > > > > > In my recent work on KIP-986 I found it necessary to introduce > > > "namespaces" to group topics together, and I was going to replicate > > > the ACL system to specify those namespaces. This change to the ACL > > > system could increase the expressiveness and complexity of that > > > feature, if it is ever implemented. > > > One of the primitives I needed when specifying namespaces was the > > > ability to tell when two namespaces overlapped (i.e. does there exist > > > any topic which is present in both namespaces). This is trivial to do > > > with the current PREFIX and LITERAL system, as we can find the > > > maximum-length common prefix with just some length comparisons and > > > equality checks. > > > I considered specifying namespaces via regular expressions, and found > > > that it was computationally much more difficult. Computing the > > > intersection of two regexes appears to be exponential in the length of > > > the regexes, leading me to avoid adding it. > > > > > > I understand that you're not suggesting full REGEX support, and that > > > "namespaces" don't need to support MATCH, but I think MATCH may run > > > into related difficulties. Any MATCH can overlap with any other MATCH > > > or PREFIX if it includes a sufficient number of wildcards. For > > > example: > > > MATCH *-accounts-* has overlap with PREFIX nl as they can both match > > > "nl-accounts-localtopic", but that isn't sensitive to the contents > > > "nl", it is true for any PREFIX. > > > MATCH *-accounts-* has overlap with MATCH *localtopic, as they can > > > both match "nl-accounts-localtopic", but that isn't actually sensitive > > > to the contents "localtopic", it's true for any MATCH which includes a > > > wildcard at the beginning. > > > > > > This has implications for execution complexity: If we can't compute > > > whether two patterns overlap, then we need to run both of them on each > > > piece of input to test if they both
Re: [DISCUSS] KIP-1044: A proposal to change idempotent producer -- server implementation
I think that the point here is that the design that assumes that you can keep all the PIDs in memory for all server configurations and all usages and all client implementations is fraught with danger. Yes, there are solutions already in place (KIP-854) that attempt to address this problem, and other proposed solutions to remove that have undesirable side effects (e.g. Heartbeat interrupted by IP failure for a slow producer with a long delay between posts). KAFKA-16229 (Slow expiration of Producer IDs leading to high CPU usage) dealt with how to expire data from the cache so that there was minimal lag time. But the net issue is still the underlying design/architecture. There are a couple of salient points here: - The state of a state machine is only a view on its transactions. This is the classic stream / table dichotomy. - What the "cache" is trying to do is create that view. - In some cases the size of the state exceeds the storage of the cache and the systems fail. - The current solutions have attempted to place limits on the size of the state. - Errors in implementation and or configuration will eventually lead to "problem producers" - Under the adopted fixes and current slate of proposals, the "problem producers" solutions have cascading side effects on properly behaved producers. (e.g. dropping long running, slow producing producers) For decades (at least since the 1980's and anecdotally since the 1960's) there has been a solution to processing state where the size of the state exceeded the memory available. It is the solution that drove the idea that you could have tables in Kafka. The idea that we can store the hot PIDs in memory using an LRU and write data to storage so that we can quickly find things not in the cache is not new. It has been proven. I am arguing that we should not throw away state data because we are running out of memory. We should persist that data to disk and consider the disk as the source of truth for state. Claude On Wed, May 15, 2024 at 7:42 PM Justine Olshan wrote: > +1 to the comment. > > > I still feel we are doing all of this only because of a few anti-pattern > or misconfigured producers and not because we have “too many Producer”. I > believe that implementing Producer heartbeat and remove short-lived PIDs > from the cache if we didn’t receive heartbeat will be more simpler and step > on right direction to improve idempotent logic and maybe try to make PID > get reused between session which will implement a real idempotent producer > instead of idempotent session. I admit this wouldn’t help with old clients > but it will put us on the right path. > > This issue is very complicated and I appreciate the attention on it. > Hopefully we can find a good solution working together :) > > Justine > > On Wed, May 15, 2024 at 8:36 AM Omnia Ibrahim > wrote: > > > Also in the rejection alternatives you listed an approved KIP which is a > > bit confusing can you move this to motivations instead > > > > > On 15 May 2024, at 14:35, Claude Warren wrote: > > > > > > This is a proposal that should solve the OOM problem on the servers > > without > > > some of the other proposed KIPs being active. > > > > > > Full details in > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1044%3A+A+proposal+to+change+idempotent+producer+--+server+implementation > > > > > -- LinkedIn: http://www.linkedin.com/in/claudewarren
Re: [DISCUSS] KIP-1042 support for wildcard when creating new acls
REGEX support, and that > "namespaces" don't need to support MATCH, but I think MATCH may run > into related difficulties. Any MATCH can overlap with any other MATCH > or PREFIX if it includes a sufficient number of wildcards. For > example: > MATCH *-accounts-* has overlap with PREFIX nl as they can both match > "nl-accounts-localtopic", but that isn't sensitive to the contents > "nl", it is true for any PREFIX. > MATCH *-accounts-* has overlap with MATCH *localtopic, as they can > both match "nl-accounts-localtopic", but that isn't actually sensitive > to the contents "localtopic", it's true for any MATCH which includes a > wildcard at the beginning. > > This has implications for execution complexity: If we can't compute > whether two patterns overlap, then we need to run both of them on each > piece of input to test if they both match. Under the current > LITERAL/PREFIX system, we can optimize execution with a trie, but that > option wouldn't be available to us with MATCH. > > The current system makes users evaluate a trade-off: > 1. Optimize the number of ACLs by organizing topics according to > prefixes (for example, "accounts-localtopic-nl" and PREFIX "accounts", > PREFIX "accounts-localtopic") > 2. Use less-structured topic names, with a corresponding ACL scheme > that has more individual rules. > The system currently informs users of this tradeoff by making them > write multiple ACLs, and making them think "there has got to be a > better way!". Perhaps we can find a better way to surface this best > practice, or better inform users about it. > > I understand that there are going to be situations more complex than > your example, where multiple individual rules will always be necessary > with only PREFIX evaluation. I think even in those situations, a > number of efficient-to-evaluate rules is preferable to just one > expensive-to-evaluate rule. > > One alternative that I thought of could be "PARAMETERIZED" ACLs which > are like PREFIXED, but allow some parameter substitution. For example > PARAMETERIZED "(nl|de|cz)-accounts-". I'm lifting regex syntax here, > but this isn't actually a regex, and wouldn't allow arbitrary numbers > of characters, or the * or + operators. > In the background it could evaluate exactly like the 3 individual > PREFIX rules, but be easier to evaluate on the backend, and support > the intersection query I mentioned earlier. It could also support > [a-zA-Z] notation in case the parameter values aren't known ahead of > time, but have a fixed length. > > Thanks, > Greg > > On Mon, May 6, 2024 at 11:17 AM Claude Warren wrote: > > > > I have an idea for how to reduce the time for ACL lookups in general and > > particularly where wildcards are involved using sequence > > characterization techniques from bioinformatics. But I need a set of ACL > > patterns and associated topics to test with. > > > > On Fri, May 3, 2024 at 2:45 PM Haruki Okada wrote: > > > > > Hi, Murali. > > > > > > First, could you add the KIP-1042 to the index ( > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals > > > ) > > > as well so that everyone can find it easily? > > > > > > I took a look at the KIP, then I have 2 questions: > > > > > > 1. Though the new MATCH resource pattern type may reduce the effort of > > > adding ACLs in some cases, do you have any concrete use case you are in > > > mind? (When prefixed ACL was introduced in KIP-290, there was a > use-case > > > that using it for implementing multi-tenancy) > > > > > > 2. As you may know, ACL lookup is in the hot-path which the > performance is > > > very important. ( > > > > > > > https://github.com/apache/kafka/blob/240243b91d69c2b65b5e456065fdcce90c121b04/core/src/main/scala/kafka/security/authorizer/AclAuthorizer.scala#L539 > > > ). > > > Do you have ideas how do we update `matchingAcls` to support > MATCH-type ACL > > > without introducing performance issue? > > > > > > > > > Thanks, > > > > > > 2024年5月3日(金) 19:51 Claude Warren, Jr : > > > > > > > As I wrote in [1], the ACL evaluation algorithm needs to be specified > > > with > > > > respect to the specificity of the pattern so that we know exactly > which > > > if > > > > *-accounts-* takes precedence over nl-accounts-* or visa versa. > > > > > > > > I
Re: [DISCUSS] KIP-1044: A proposal to change idempotent producer -- server implementation
> > Why should we persist useless information > for clients that are long gone and will never use it? We are not. We only persist the information for the length of time we retain snapshots. The change here is to make the snapshots work as longer term storage for infrequent producers and others would would be negatively affected by some of the solutions proposed. Your changes require changes in the clients. Older clients will not be able to participate. My change does not require client change. There are issues outside of the ones discussed. I was told of this late last week. I will endeavor to find someone with first hand knowledge of the issue and have them report on this thread. In addition, the use of an LRU amortizes the cache cleanup so we don't need a thread to expire things. You still have the cache, the point is that it really is a cache, there is storage behind it. Let the cache be a cache, let the snapshots be the storage backing behind the cache. On Fri, May 17, 2024 at 5:26 PM Justine Olshan wrote: > Respectfully, I don't agree. Why should we persist useless information > for clients that are long gone and will never use it? > This is why I'm suggesting we do something smarter when it comes to storing > data and only store data we actually need and have a use for. > > This is why I suggest the heartbeat. It gives us clear information (up to > the heartbeat interval) of which producers are worth keeping and which that > are not. > I'm not in favor of building a new and complicated system to try to guess > which information is needed. In my mind, if we have a ton of legitimately > active producers, we should scale up memory. If we don't there is no reason > to have high memory usage. > > Fixing the client also allows us to fix some of the other issues we have > with idempotent producers. > > Justine > > On Fri, May 17, 2024 at 12:46 AM Claude Warren wrote: > > > I think that the point here is that the design that assumes that you can > > keep all the PIDs in memory for all server configurations and all usages > > and all client implementations is fraught with danger. > > > > Yes, there are solutions already in place (KIP-854) that attempt to > address > > this problem, and other proposed solutions to remove that have > undesirable > > side effects (e.g. Heartbeat interrupted by IP failure for a slow > producer > > with a long delay between posts). KAFKA-16229 (Slow expiration of > Producer > > IDs leading to high CPU usage) dealt with how to expire data from the > cache > > so that there was minimal lag time. > > > > But the net issue is still the underlying design/architecture. > > > > There are a couple of salient points here: > > > >- The state of a state machine is only a view on its transactions. > This > >is the classic stream / table dichotomy. > >- What the "cache" is trying to do is create that view. > >- In some cases the size of the state exceeds the storage of the cache > >and the systems fail. > >- The current solutions have attempted to place limits on the size of > >the state. > >- Errors in implementation and or configuration will eventually lead > to > >"problem producers" > >- Under the adopted fixes and current slate of proposals, the "problem > >producers" solutions have cascading side effects on properly behaved > >producers. (e.g. dropping long running, slow producing producers) > > > > For decades (at least since the 1980's and anecdotally since the 1960's) > > there has been a solution to processing state where the size of the state > > exceeded the memory available. It is the solution that drove the idea > that > > you could have tables in Kafka. The idea that we can store the hot PIDs > in > > memory using an LRU and write data to storage so that we can quickly find > > things not in the cache is not new. It has been proven. > > > > I am arguing that we should not throw away state data because we are > > running out of memory. We should persist that data to disk and consider > > the disk as the source of truth for state. > > > > Claude > > > > > > On Wed, May 15, 2024 at 7:42 PM Justine Olshan > > > > wrote: > > > > > +1 to the comment. > > > > > > > I still feel we are doing all of this only because of a few > > anti-pattern > > > or misconfigured producers and not because we have “too many Producer”. > > I > > > believe that implementing Producer heartbeat and remove short-lived > PIDs > > >
Re: [DISCUSS] KIP-1044: A proposal to change idempotent producer -- server implementation
The LRU cache is just that: a cache, so yes things expire from the cache but they are not gone. As long as a snapshot containing the PID is available the PID can be found and reloaded into the cache (which is exactly what I would expect it to do). The question of how long a PID is resolvable then becomes a question of how long are snapshots retained. There are, in my mind, several advantages: 1. The in-memory cache can be smaller, reducing the memory footprint. This is not required but is possible. 2. PIDs are never discarded because they are produced by slow producers. They are discarded when the snapshots containing them expire. 3. The length of time between when a PID is received by the server and when it is recorded to a snapshot is significantly reduced. Significantly reducing the window where PIDs can be lost. 4. Throttling and other changes you wish to make to the cache are still possible. On Mon, May 20, 2024 at 7:32 PM Justine Olshan wrote: > My team has looked at it from a high level, but we haven't had the time to > come up with a full proposal. > > I'm not aware if others have worked on it. > > Justine > > On Mon, May 20, 2024 at 10:21 AM Omnia Ibrahim > wrote: > > > Hi Justine are you aware of anyone looking into such new protocol at the > > moment? > > > > > On 20 May 2024, at 18:00, Justine Olshan > > > wrote: > > > > > > I would say I have first hand knowledge of this issue as someone who > > > responds to such incidents as part of my work at Confluent over the > past > > > couple years. :) > > > > > >> We only persist the information for the length of time we retain > > > snapshots. > > > This seems a bit contradictory to me. We are going to persist > > (potentially) > > > useless information if we have no signal if the producer is still > active. > > > This is the problem we have with old clients. We are always going to > have > > > to draw the line for how long we allow a producer to have a gap in > > > producing vs how long we allow filling up with short-lived producers > that > > > risk OOM. > > > > > > With an LRU cache, we run into the same problem, as we will expire all > > > "well-behaved" infrequent producers that last produced before the burst > > of > > > short-lived clients. The benefit is that we don't have a solid line in > > the > > > sand and we only expire when we need to, but we will still risk > expiring > > > active producers. > > > > > > I am willing to discuss some solutions that work with older clients, > but > > my > > > concern is spending too much time on a complicated solution and not > > > encouraging movement to newer and better clients. > > > > > > Justine > > > > > > On Mon, May 20, 2024 at 9:35 AM Claude Warren > wrote: > > > > > >>> > > >>> Why should we persist useless information > > >>> for clients that are long gone and will never use it? > > >> > > >> > > >> We are not. We only persist the information for the length of time we > > >> retain snapshots. The change here is to make the snapshots work as > > longer > > >> term storage for infrequent producers and others would would be > > negatively > > >> affected by some of the solutions proposed. > > >> > > >> Your changes require changes in the clients. Older clients will not > be > > >> able to participate. My change does not require client change. > > >> There are issues outside of the ones discussed. I was told of this > late > > >> last week. I will endeavor to find someone with first hand knowledge > of > > >> the issue and have them report on this thread. > > >> > > >> In addition, the use of an LRU amortizes the cache cleanup so we don't > > need > > >> a thread to expire things. You still have the cache, the point is > that > > it > > >> really is a cache, there is storage behind it. Let the cache be a > > cache, > > >> let the snapshots be the storage backing behind the cache. > > >> > > >> On Fri, May 17, 2024 at 5:26 PM Justine Olshan > > >> > > >> wrote: > > >> > > >>> Respectfully, I don't agree. Why should we persist useless > information > > >>> for clients that are long gone and will never use it? > > >>> This is why I'm suggesting we do somethi
Re: [DISCUSS] KIP-853: KRaft Controller Membership Changes
Is there test code, or initial POC code for this KIP somewhere? I would like to help move this forward but need a few pointers to associated resources. I have read KIP-853 and it is beginning to sink in, but code would be nice. Thanks, Claude On 2024/03/21 18:41:04 José Armando García Sancio wrote: > Hi Jun, > > On Thu, Mar 14, 2024 at 3:38 PM Jun Rao wrote: > > 52. Admin client: Could you provide a bit more details on the changes? > > I updated the KIP to include the API changes to the Admin client. > > Thanks, > -- > -José >
[DISCUSS] KIP-936 Throttle number of active PIDs
The overall design for KIP-936 seems sound to me. I would make the following changes: Replace the "TimedBloomFilter" with a "LayeredBloomFilter" from commons-collections v4.5 Define the producer.id.quota.window.size.seconds to be the length of time that a Bloom filter of PIDs will exist. Define a new configuration option "producer.id.quota.window.count" as the number of windows active in window.size.seconds. Define the "Shape" (See commons-collections bloomfilters v4.5) of the bloom filter from the average number of PIDs expected in window.size.seconds/window.count (call this N) and the probability of false positives (call this P). Due to the way the LayeredBloomFilter works the number of items can be a lower number than the max. I'll explain that in a minute. The LayeredBloomFilter implements the standard BloomFilter interface but internally keeps an ordered list of filters (called layers) from oldest created to newest. It adds new layers when a specified Predicate (checkExtend) returns true. It will remove filters as defined by a specified Consumer (filterCleanup). Everytime a BloomFilter is merged into the LayeredBloomFilter the filter checks to the "checkExtend" predicate. If it fires the "filterCleanup" is called to remove any layers that should be removed and a new layer is added. Define the layers of the LayeredBloomFilter to comprise a standard BloomFilter and an associated expiration timestamp. We can thus define - "checkExtend" to require a new layer window.size.seconds / window.count seconds or when the current layer contains shape.N items. - "filterCleanup" to start at the head of the list of layers and remove any expired filters, usually 0, every window.size.seconds 1, infrequently more than 1. This system will correctly handle bursty loads. There are 3 cases to consider: 1. If the producer is producing fewer than shape.N PIDs the layer will not fill up before the next layer is added. 2. If the producer is producing shape.N PIDs the layer will be processed as either a 1 or a 3 depending on system timings. 3. If the producer is producing more than shape.N PIDs the layer will fill up and a new layer will be created with an expiration timestamp window.size.seconds from when it was created. This is the case that leads to the filterCleanup infrequently having more than 1 layer to remove. The last case to consider is if a producer stops generating PIDs, in this case we should walk the map of producers, call "filterCleanup", and then check to see if the LayeredBloomFilter is empty. If so, remove it from the map. It will be empty if the producer has not produced a PID for window.size.seconds. I have this solution mostly coded, though I must admit I do not know where to plugin the ProducerIdQuotaManager defined in the KIP Claude
Re: [DISCUSS] KIP-936 Throttle number of active PIDs
I should also note that the probability of false positives does not fall below shape.P because as it approaches shape.P a new layer is created and filters are added to that. So no layer in the LayeredBloomFilter exceeds shape.P thus the entire filter does not exceed shape.P. Claude On Tue, Apr 9, 2024 at 2:26 PM Claude Warren wrote: > The overall design for KIP-936 seems sound to me. I would make the > following changes: > > Replace the "TimedBloomFilter" with a "LayeredBloomFilter" from > commons-collections v4.5 > > Define the producer.id.quota.window.size.seconds to be the length of time > that a Bloom filter of PIDs will exist. > Define a new configuration option "producer.id.quota.window.count" as the > number of windows active in window.size.seconds. > > Define the "Shape" (See commons-collections bloomfilters v4.5) of the > bloom filter from the average number of PIDs expected in > window.size.seconds/window.count (call this N) and the probability of false > positives (call this P). Due to the way the LayeredBloomFilter works the > number of items can be a lower number than the max. I'll explain that in a > minute. > > The LayeredBloomFilter implements the standard BloomFilter interface but > internally keeps an ordered list of filters (called layers) from oldest > created to newest. It adds new layers when a specified Predicate > (checkExtend) returns true. It will remove filters as defined by a > specified Consumer (filterCleanup). > > Everytime a BloomFilter is merged into the LayeredBloomFilter the filter > checks to the "checkExtend" predicate. If it fires the "filterCleanup" is > called to remove any layers that should be removed and a new layer is added. > > Define the layers of the LayeredBloomFilter to comprise a standard > BloomFilter and an associated expiration timestamp. > > We can thus define > >- "checkExtend" to require a new layer window.size.seconds / >window.count seconds or when the current layer contains shape.N items. >- "filterCleanup" to start at the head of the list of layers and >remove any expired filters, usually 0, every window.size.seconds 1, >infrequently more than 1. > > This system will correctly handle bursty loads. There are 3 cases to > consider: > >1. If the producer is producing fewer than shape.N PIDs the layer will >not fill up before the next layer is added. >2. If the producer is producing shape.N PIDs the layer will be >processed as either a 1 or a 3 depending on system timings. >3. If the producer is producing more than shape.N PIDs the layer will >fill up and a new layer will be created with an expiration timestamp >window.size.seconds from when it was created. This is the case that leads >to the filterCleanup infrequently having more than 1 layer to remove. > > The last case to consider is if a producer stops generating PIDs, in this > case we should walk the map of producers, call "filterCleanup", and then > check to see if the LayeredBloomFilter is empty. If so, remove it from the > map. It will be empty if the producer has not produced a PID for > window.size.seconds. > > I have this solution mostly coded, though I must admit I do not know where > to plugin the ProducerIdQuotaManager defined in the KIP > > Claude >
Re: [DISCUSS] KIP-1034: Dead letter queue in Kafka Streams
My concern is that someone would create a dead letter queue on a sensitive topic and not get the ACL correct from the start. Thus causing potential confidential data leak. Is there anything in the proposal that would prevent that from happening? If so I did not recognize it as such. On Fri, Apr 12, 2024 at 9:45 AM Damien Gasparina wrote: > Hi Claude, > > In this KIP, the Dead Letter Queue is materialized by a standard and > independant topic, thus normal ACL applies to it like any other topic. > This should not introduce any security issues, obviously, the right > ACL would need to be provided to write to the DLQ if configured. > > Cheers, > Damien > > On Fri, 12 Apr 2024 at 08:59, Claude Warren, Jr > wrote: > > > > I am new to the Kafka codebase so please excuse any ignorance on my part. > > > > When a dead letter queue is established is there a process to ensure that > > it at least is defined with the same ACL as the original queue? Without > > such a guarantee at the start it seems that managing dead letter queues > > will be fraught with security issues. > > > > > > On Wed, Apr 10, 2024 at 10:34 AM Damien Gasparina > > > wrote: > > > > > Hi everyone, > > > > > > To continue on our effort to improve Kafka Streams error handling, we > > > propose a new KIP to add out of the box support for Dead Letter Queue. > > > The goal of this KIP is to provide a default implementation that > > > should be suitable for most applications and allow users to override > > > it if they have specific requirements. > > > > > > In order to build a suitable payload, some additional changes are > > > included in this KIP: > > > 1. extend the ProcessingContext to hold, when available, the source > > > node raw key/value byte[] > > > 2. expose the ProcessingContext to the ProductionExceptionHandler, > > > it is currently not available in the handle parameters. > > > > > > Regarding point 2., to expose the ProcessingContext to the > > > ProductionExceptionHandler, we considered two choices: > > > 1. exposing the ProcessingContext as a parameter in the handle() > > > method. That's the cleanest way IMHO, but we would need to deprecate > > > the old method. > > > 2. exposing the ProcessingContext as an attribute in the interface. > > > This way, no method is deprecated, but we would not be consistent with > > > the other ExceptionHandler. > > > > > > In the KIP, we chose the 1. solution (new handle signature with old > > > one deprecated), but we could use other opinions on this part. > > > More information is available directly on the KIP. > > > > > > KIP link: > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1034%3A+Dead+letter+queue+in+Kafka+Streams > > > > > > Feedbacks and suggestions are welcome, > > > > > > Cheers, > > > Damien, Sebastien and Loic > > > > -- LinkedIn: http://www.linkedin.com/in/claudewarren
Re: [DISCUSS] KIP-936 Throttle number of active PIDs
Initial code is available at https://github.com/Claudenw/kafka/blob/KIP-936/storage/src/main/java/org/apache/kafka/storage/internals/log/ProducerIDQuotaManager.java On Tue, Apr 9, 2024 at 2:37 PM Claude Warren wrote: > I should also note that the probability of false positives does not fall > below shape.P because as it approaches shape.P a new layer is created and > filters are added to that. So no layer in the LayeredBloomFilter exceeds > shape.P thus the entire filter does not exceed shape.P. > > Claude > > On Tue, Apr 9, 2024 at 2:26 PM Claude Warren wrote: > >> The overall design for KIP-936 seems sound to me. I would make the >> following changes: >> >> Replace the "TimedBloomFilter" with a "LayeredBloomFilter" from >> commons-collections v4.5 >> >> Define the producer.id.quota.window.size.seconds to be the length of time >> that a Bloom filter of PIDs will exist. >> Define a new configuration option "producer.id.quota.window.count" as the >> number of windows active in window.size.seconds. >> >> Define the "Shape" (See commons-collections bloomfilters v4.5) of the >> bloom filter from the average number of PIDs expected in >> window.size.seconds/window.count (call this N) and the probability of false >> positives (call this P). Due to the way the LayeredBloomFilter works the >> number of items can be a lower number than the max. I'll explain that in a >> minute. >> >> The LayeredBloomFilter implements the standard BloomFilter interface but >> internally keeps an ordered list of filters (called layers) from oldest >> created to newest. It adds new layers when a specified Predicate >> (checkExtend) returns true. It will remove filters as defined by a >> specified Consumer (filterCleanup). >> >> Everytime a BloomFilter is merged into the LayeredBloomFilter the filter >> checks to the "checkExtend" predicate. If it fires the "filterCleanup" is >> called to remove any layers that should be removed and a new layer is added. >> >> Define the layers of the LayeredBloomFilter to comprise a standard >> BloomFilter and an associated expiration timestamp. >> >> We can thus define >> >>- "checkExtend" to require a new layer window.size.seconds / >>window.count seconds or when the current layer contains shape.N items. >>- "filterCleanup" to start at the head of the list of layers and >>remove any expired filters, usually 0, every window.size.seconds 1, >>infrequently more than 1. >> >> This system will correctly handle bursty loads. There are 3 cases to >> consider: >> >>1. If the producer is producing fewer than shape.N PIDs the layer >>will not fill up before the next layer is added. >>2. If the producer is producing shape.N PIDs the layer will be >>processed as either a 1 or a 3 depending on system timings. >>3. If the producer is producing more than shape.N PIDs the layer will >>fill up and a new layer will be created with an expiration timestamp >>window.size.seconds from when it was created. This is the case that leads >>to the filterCleanup infrequently having more than 1 layer to remove. >> >> The last case to consider is if a producer stops generating PIDs, in this >> case we should walk the map of producers, call "filterCleanup", and then >> check to see if the LayeredBloomFilter is empty. If so, remove it from the >> map. It will be empty if the producer has not produced a PID for >> window.size.seconds. >> >> I have this solution mostly coded, though I must admit I do not know >> where to plugin the ProducerIdQuotaManager defined in the KIP >> >> Claude >> >
Re: [DISCUSS] KIP-936 Throttle number of active PIDs
I think there is an issue in the KIP. Basically the kip says, if the PID is found in either of the Bloom filters then no action is taken If the PID is not found then it is added and the quota rating metrics are incremented. In this case long running PIDs will be counted multiple times. Let's assume a 30 minute window with 2 15-minute frames. So for the first 15 minutes all PIDs are placed in the first Bloom filter and for the 2nd 15 minutes all new PIDs are placed in the second bloom filter. At the 3rd 15 minutes the first filter is removed and a new empty one created. Let's denote Bloom filters as BFn{} and indicate the contained pids between the braces. So at t0 lets insert PID0 and increment the rating metrics. Thus we have BF0{PID0} at t0+5 let's insert PID1 and increment the rating metrics. Thus we have BF0{PID0, PID1} at t0+10 we see PID0 again but no changes occur. at t0+15 we start t1 and we have BF0{PID0, PID1}, BF1{} at t1+5 we see PID2, increment the rating metrics, and we have BF0{PID0, PID1}, BF1{PID2} at t1+6 we see PID0 again and no changes occur at t1+7 we see PID1 again and no changes occur at t1+15 we start a new window and dispose of BF0. Thus we have BF1{PID2}, BF2{} at t2+1 we see PID3, increment the rating metrics, and we have we have BF1{PID2}, BF2{PID3} at t2+6 we see PID0 again but now it is not in the list so we increment the rating metrics and add it BF1{PID2}, BF2{PID3, PID0} But we just saw PID0 15 minutes ago. Well within the 30 minute window we are trying to track. Or am I missing something? It seems like we need to add each PID to the last bloom filter On Fri, Apr 12, 2024 at 2:45 PM Claude Warren wrote: > Initial code is available at > https://github.com/Claudenw/kafka/blob/KIP-936/storage/src/main/java/org/apache/kafka/storage/internals/log/ProducerIDQuotaManager.java > > On Tue, Apr 9, 2024 at 2:37 PM Claude Warren wrote: > >> I should also note that the probability of false positives does not fall >> below shape.P because as it approaches shape.P a new layer is created and >> filters are added to that. So no layer in the LayeredBloomFilter exceeds >> shape.P thus the entire filter does not exceed shape.P. >> >> Claude >> >> On Tue, Apr 9, 2024 at 2:26 PM Claude Warren wrote: >> >>> The overall design for KIP-936 seems sound to me. I would make the >>> following changes: >>> >>> Replace the "TimedBloomFilter" with a "LayeredBloomFilter" from >>> commons-collections v4.5 >>> >>> Define the producer.id.quota.window.size.seconds to be the length of >>> time that a Bloom filter of PIDs will exist. >>> Define a new configuration option "producer.id.quota.window.count" as >>> the number of windows active in window.size.seconds. >>> >>> Define the "Shape" (See commons-collections bloomfilters v4.5) of the >>> bloom filter from the average number of PIDs expected in >>> window.size.seconds/window.count (call this N) and the probability of false >>> positives (call this P). Due to the way the LayeredBloomFilter works the >>> number of items can be a lower number than the max. I'll explain that in a >>> minute. >>> >>> The LayeredBloomFilter implements the standard BloomFilter interface but >>> internally keeps an ordered list of filters (called layers) from oldest >>> created to newest. It adds new layers when a specified Predicate >>> (checkExtend) returns true. It will remove filters as defined by a >>> specified Consumer (filterCleanup). >>> >>> Everytime a BloomFilter is merged into the LayeredBloomFilter the filter >>> checks to the "checkExtend" predicate. If it fires the "filterCleanup" is >>> called to remove any layers that should be removed and a new layer is added. >>> >>> Define the layers of the LayeredBloomFilter to comprise a standard >>> BloomFilter and an associated expiration timestamp. >>> >>> We can thus define >>> >>>- "checkExtend" to require a new layer window.size.seconds / >>>window.count seconds or when the current layer contains shape.N items. >>>- "filterCleanup" to start at the head of the list of layers and >>>remove any expired filters, usually 0, every window.size.seconds 1, >>>infrequently more than 1. >>> >>> This system will correctly handle bursty loads. There are 3 cases to >>> consider: >>> >>>1. If the producer is producing fewer than shape.N PIDs the layer >>>will not fill up before the next layer is added. >>>2. If
Re: [ANNOUNCE] New Kafka PMC Member: Greg Harris
Congrats Greg! All the hard work paid off. On Mon, Apr 15, 2024 at 6:58 AM Ivan Yurchenko wrote: > Congrats Greg! > > On Sun, Apr 14, 2024, at 22:51, Sophie Blee-Goldman wrote: > > Congrats Greg! Happy to have you > > > > On Sun, Apr 14, 2024 at 9:26 AM Jorge Esteban Quilcate Otoya < > > quilcate.jo...@gmail.com> wrote: > > > > > Congrats, Greg!! > > > > > > On Sun 14. Apr 2024 at 15.05, Josep Prat > > > wrote: > > > > > > > Congrats Greg!!! > > > > > > > > > > > > Best, > > > > > > > > Josep Prat > > > > Open Source Engineering Director, aivenjosep.p...@aiven.io | > > > > +491715557497 | aiven.io > > > > Aiven Deutschland GmbH > > > > Alexanderufer 3-7, 10117 Berlin > > > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen > > > > Amtsgericht Charlottenburg, HRB 209739 B > > > > > > > > On Sun, Apr 14, 2024, 12:30 Divij Vaidya > > > wrote: > > > > > > > > > Congratulations Greg! > > > > > > > > > > -- > > > > > Divij Vaidya > > > > > > > > > > > > > > > > > > > > On Sun, Apr 14, 2024 at 6:39 AM Kamal Chandraprakash < > > > > > kamal.chandraprak...@gmail.com> wrote: > > > > > > > > > > > Congratulations, Greg! > > > > > > > > > > > > On Sun, Apr 14, 2024 at 8:57 AM Yash Mayya > > > > > wrote: > > > > > > > > > > > > > Congrats Greg! > > > > > > > > > > > > > > On Sun, 14 Apr, 2024, 05:56 Randall Hauch, > > > wrote: > > > > > > > > > > > > > > > Congratulations, Greg! > > > > > > > > > > > > > > > > On Sat, Apr 13, 2024 at 6:36 PM Luke Chen > > > > > wrote: > > > > > > > > > > > > > > > > > Congrats, Greg! > > > > > > > > > > > > > > > > > > On Sun, Apr 14, 2024 at 7:05 AM Viktor Somogyi-Vass > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Congrats Greg! :) > > > > > > > > > > > > > > > > > > > > On Sun, Apr 14, 2024, 00:35 Bill Bejeck < > bbej...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Congrats Greg! > > > > > > > > > > > > > > > > > > > > > > -Bill > > > > > > > > > > > > > > > > > > > > > > On Sat, Apr 13, 2024 at 4:25 PM Boudjelda Mohamed Said > < > > > > > > > > > > bmsc...@gmail.com> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Congratulations Greg > > > > > > > > > > > > > > > > > > > > > > > > On Sat 13 Apr 2024 at 20:42, Chris Egerton < > > > > > > ceger...@apache.org> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > > > > > > > Greg Harris has been a Kafka committer since July > 2023. > > > > He > > > > > > has > > > > > > > > > > remained > > > > > > > > > > > > > very active and instructive in the community since > > > > > becoming a > > > > > > > > > > > committer. > > > > > > > > > > > > > It's my pleasure to announce that Greg is now a > member > > > of > > > > > > Kafka > > > > > > > > > PMC. > > > > > > > > > > > > > > > > > > > > > > > > > > Congratulations, Greg! > > > > > > > > > > > > > > > > > > > > > > > > > > Chris, on behalf of the Apache Kafka PMC > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- LinkedIn: http://www.linkedin.com/in/claudewarren
Re: [DISCUSS] KIP-936 Throttle number of active PIDs
After thinking about his KIP over the weekend I think that there is another lighter weight approach. I think the question is not whether or not we have seen a given PID before but rather how many unique PIDs did the principal create in the last hour. Perhaps more exactly it is: did the Principal create more than X PIDS in the last Y time units? This question can be quickly answered by a CPC datasketch [1]. The solution would be something like: Break the Y time units into a set of Y' smaller partitions (e.g. 60 1-minute partitions for an hour). Create a circular queue of Y' CPC datasketches for each principal. Implement a queue entry selector based on the modulus of the system by the resolution of the Y' partitions. On each call: On queue entry selector change clear the CPC (makes it empty) Add the latest PID to the current queue entry. Sum up the CPCs and check if the max (or min) estimate of unique counts exceeds the limit for the user. When the CPC returns a zero estimated count then the principal has gone away and the principal/CPC-queue pair can be removed from the tracking system. I believe that this code solution is smaller and faster than the Bloom filter implementation. [1] https://datasketches.apache.org/docs/CPC/CPC.html On Fri, Apr 12, 2024 at 3:10 PM Claude Warren wrote: > I think there is an issue in the KIP. > > Basically the kip says, if the PID is found in either of the Bloom filters > then no action is taken > If the PID is not found then it is added and the quota rating metrics are > incremented. > > In this case long running PIDs will be counted multiple times. > > Let's assume a 30 minute window with 2 15-minute frames. So for the first > 15 minutes all PIDs are placed in the first Bloom filter and for the 2nd 15 > minutes all new PIDs are placed in the second bloom filter. At the 3rd 15 > minutes the first filter is removed and a new empty one created. > > Let's denote Bloom filters as BFn{} and indicate the contained pids > between the braces. > > > So at t0 lets insert PID0 and increment the rating metrics. Thus we have > BF0{PID0} > at t0+5 let's insert PID1 and increment the rating metrics. Thus we have > BF0{PID0, PID1} > at t0+10 we see PID0 again but no changes occur. > at t0+15 we start t1 and we have BF0{PID0, PID1}, BF1{} > at t1+5 we see PID2, increment the rating metrics, and we have BF0{PID0, > PID1}, BF1{PID2} > at t1+6 we see PID0 again and no changes occur > at t1+7 we see PID1 again and no changes occur > at t1+15 we start a new window and dispose of BF0. Thus we have > BF1{PID2}, BF2{} > at t2+1 we see PID3, increment the rating metrics, and we have we have > BF1{PID2}, BF2{PID3} > at t2+6 we see PID0 again but now it is not in the list so we increment > the rating metrics and add it BF1{PID2}, BF2{PID3, PID0} > > But we just saw PID0 15 minutes ago. Well within the 30 minute window we > are trying to track. Or am I missing something? It seems like we need to > add each PID to the last bloom filter > > On Fri, Apr 12, 2024 at 2:45 PM Claude Warren wrote: > >> Initial code is available at >> https://github.com/Claudenw/kafka/blob/KIP-936/storage/src/main/java/org/apache/kafka/storage/internals/log/ProducerIDQuotaManager.java >> >> On Tue, Apr 9, 2024 at 2:37 PM Claude Warren wrote: >> >>> I should also note that the probability of false positives does not fall >>> below shape.P because as it approaches shape.P a new layer is created and >>> filters are added to that. So no layer in the LayeredBloomFilter exceeds >>> shape.P thus the entire filter does not exceed shape.P. >>> >>> Claude >>> >>> On Tue, Apr 9, 2024 at 2:26 PM Claude Warren wrote: >>> >>>> The overall design for KIP-936 seems sound to me. I would make the >>>> following changes: >>>> >>>> Replace the "TimedBloomFilter" with a "LayeredBloomFilter" from >>>> commons-collections v4.5 >>>> >>>> Define the producer.id.quota.window.size.seconds to be the length of >>>> time that a Bloom filter of PIDs will exist. >>>> Define a new configuration option "producer.id.quota.window.count" as >>>> the number of windows active in window.size.seconds. >>>> >>>> Define the "Shape" (See commons-collections bloomfilters v4.5) of the >>>> bloom filter from the average number of PIDs expected in >>>> window.size.seconds/window.count (call this N) and the probability of false >>>> positives (call this P). Due to the way the LayeredBloomFilter works the >>>> number of items can be a lower number t
Re: [DISCUSS] KIP-936 Throttle number of active PIDs
Let's put aside the CPC datasketch idea and just discuss the Bloom filter approach. I thinkthe problem with the way the KIP is worded is that PIDs are only added if they are not seen in either of the Bloom filters. So an early PID is added to the first filter and the associated metric is updated. that PID is seen multiple times over the next 60 minutes, but is not added to the Bloom filters again. once the 60 minutes elapses the first filter is cleared, or removed and a new one started. In any case the PID is no longer recorded in any extant Bloom filter. the PID is seen again and is added to the newest bloom filter and the associated metric is updated. I believe at this point the metric is incorrect, the PID has been counted 2x, when it has been in use for the entire time. The "track" method that I added solves this problem by ensuring that the PID is always seen in the latter half of the set of Bloom filters. In the case of 2 filters that is always the second one, but remember that the number of layers will grow as the filters become saturated. So if your filter is intended to hold 500 PIDs and the 501st PID is registered before the expiration a new layer (Bloom filter) is added for new PIDS to be added into. On Mon, Apr 15, 2024 at 5:00 PM Omnia Ibrahim wrote: > Hi Claude, > Thanks for the implementation of the LayeredBloomFilter in apache commons. > > > Define a new configuration option "producer.id.quota.window.count" as > > the number of windows active in window.size.seconds. > What is the different between “producer.id.quota.window.count” and > producer.id.quota.window.num > > > Basically the kip says, if the PID is found in either of the Bloom > filters > > then no action is taken > > If the PID is not found then it is added and the quota rating metrics are > > incremented. > > In this case long running PIDs will be counted multiple times. > > The PID is considered not encountered if both frames of the window don’t > have it. If you checked the diagram of for `Caching layer to track active > PIDs per KafkaPrincipal` you will see that each window will have 2 bloom > layers and the first created one will be disposed only when we start the > next window. Which means window2 is starting from the 2nd bloom. Basically > the bloom filter in the KIP is trying to implement a sliding window > pattern. > > > think the question is not whether or not we have seen a given PID before > > but rather how many unique PIDs did the principal create in the last > hour. > > Perhaps more exactly it is: did the Principal create more than X PIDS in > > the last Y time units? > We don’t really care about the count of unique PIDs per user. The KIP is > trying to follow and build on top of ClientQuotaManager which already have > a patter for throttling that the producer client is aware of so we don’t > need to upgrade old clients for brokers to throttle them and they can > respect the throttling. > > The pattern for throttling is that we record the activities by > incrementing a metric sensor and only when we catch > `QuotaViolationException` from the quota sensor we will be sending a > throttleTimeMs to the client. > For bandwidth throttling for example we increment the sensor by the size > of the request. For PID the KIP is aiming to call > `QuotaManagers::producerIdQuotaManager::maybeRecordAndGetThrottleTimeMs` to > increment by +1 every time we encounter a new PID and if and if > `Sensor::record` returned `QuotaViolationException` then we will send back > to the producer the trolling time that the client should wait for before > sending a new request with a new PID. > I hope this make sense. > > > This question can be quickly answered by a CPC datasketch [1]. The > > solution would be something like: > > Break the Y time units into a set of Y' smaller partitions (e.g. 60 > > 1-minute partitions for an hour). Create a circular queue of Y' CPC > > datasketches for each principal. Implement a queue entry selector based > on > > the modulus of the system by the resolution of the Y' partitions. On each > > call: > I didn’t evaluate CPC datasketch or any counter solution as I explained > above the aim is not to build a counter specially the Kafka Sensor can be > enough to indicate if we are violating the quota or not. > > Thanks > Omnia > > > On 15 Apr 2024, at 10:35, Claude Warren wrote: > > > > After thinking about his KIP over the weekend I think that there is > another > > lighter weight approach. > > > > I think the question is not whether or not we have seen a given PID > before > > but rather how many unique PIDs did the principal create in the last > hour. > > Perhaps mor
Re: [DISCUSS] KIP-936 Throttle number of active PIDs
The difference between p.i.q.window.count and p.i.q.window.num: To be honest, I may have misunderstood your definition of window num. But here is what I have in mind: 1. p.i.q.window.size.seconds the length of time that a window will exist. This is also the maximum time between PID uses where the PID is considered to be the same. Reuse of the PID after p.iq.window.size.seconds triggers recording the PID as a new PID. Define a new configuration option "producer.id.quota.window.count" as the number of windows active in window.size.seconds. 2. p.i.q.window.count (the new one), how many sections to break p.i.q.window.size.seconds into. In the initial KIP this is 2. This value gives us the timing for creating a new layer in the layered bloom filter. So every (p.i.q.window.size.seconds / p.i.q.window.count) seconds a new layer will be created and an old layer or layers will be removed. The layered bloom filter will add layers to keep the probability of false positives in range. 3. p.i.q.window.num, specified as 11 in the KIP. I thought this was how many PIDs were expected in each window. In the original KIP this means that we expect to see the PID 22 times (2 windows x 11). In the Layered Bloom filter this would be the N value for the Shape. 4. p.i.q.window.fpr (needs to be specified) the expected false positive rate. Not sure how to express this in the config in a way that makes sense but something like 0.06 or the like. This is the P value for the Shape. See https://hur.st/bloomfilter for a Bloom filter calculator. Once we have the N and P for the Shape the shape can be instantiated as "Shape s = Shape.fromNP( int n, double p );" In the layered filter once N items have been added to the layer a new layer is created. Layers are removed after p.i.q.window.size.seconds so if there is a burst of PIDs the number of layers will expand and then shrink back as the PIDs expire. While running there is always at least 1 layer. Some calculated points: - No layer will span more than p.i.q.window.size.seconds / p.i.q.window.count seconds. - Assuming Shape.N = 11, p.i.q.window.size.seconds = 60 p.i.q.window.count =2 and a rate of 22 PIDs per minute there will be 2 layers. - Assuming Shape.N = 11, p.i.q.window.size.seconds = 60 p.i.q.window.count =2 and a rate of 10 PIDs per minute there will be 2 layers. - Assuming Shape.N = 11, p.i.q.window.size.seconds = 60 p.i.q.window.count =2 and that no PIDS have been seen in the last 30 seconds there will be 2 layers. - Assuming Shape.N = 11, p.i.q.window.size.seconds = 60 p.i.q.window.count =2 and that no PIDS have been seen in the last 60 seconds there will be 1 layer. - Assuming Shape.N = 11, p.i.q.window.size.seconds = 60 p.i.q.window.count =2 and a rate of 23 to 44 PIDs per minute there will be 4 layers. - the false positive rate across the layers remains at or below Shape.P - Assuming Shape.N = 11 and Shape.P = 0.06 the Bloom filter at each layer will consume 35 bytes. https://hur.st/bloomfilter provides a quick calculator for other values. Claude On Tue, Apr 16, 2024 at 8:06 AM Claude Warren wrote: > Let's put aside the CPC datasketch idea and just discuss the Bloom filter > approach. > > I thinkthe problem with the way the KIP is worded is that PIDs are only > added if they are not seen in either of the Bloom filters. > > So an early PID is added to the first filter and the associated metric is > updated. > that PID is seen multiple times over the next 60 minutes, but is not added > to the Bloom filters again. > once the 60 minutes elapses the first filter is cleared, or removed and a > new one started. In any case the PID is no longer recorded in any extant > Bloom filter. > the PID is seen again and is added to the newest bloom filter and the > associated metric is updated. > > I believe at this point the metric is incorrect, the PID has been counted > 2x, when it has been in use for the entire time. > > The "track" method that I added solves this problem by ensuring that the > PID is always seen in the latter half of the set of Bloom filters. In the > case of 2 filters that is always the second one, but remember that the > number of layers will grow as the filters become saturated. So if your > filter is intended to hold 500 PIDs and the 501st PID is registered before > the expiration a new layer (Bloom filter) is added for new PIDS to be added > into. > > On Mon, Apr 15, 2024 at 5:00 PM Omnia Ibrahim > wrote: > >> Hi Claude, >> Thanks for the implementation of the LayeredBloomFilter in apache >> commons. >> >> > Define a new configuration option "producer.id.quota.window.count" as >> > the number of windows active in window.size.se
Confluence edit access
I would like to get edit access to the Kafka confluence so that I can work on KIP-936. Can someone here do that or do I need to go through Infra? Claude
Re: Confluence edit access
My Confluence ID is "claude" On Thu, Apr 25, 2024 at 8:40 PM Matthias J. Sax wrote: > What's your wiki ID? We can grant write access on our side if you have > already an account. > > -Matthias > > On 4/25/24 4:06 AM, Claude Warren wrote: > > I would like to get edit access to the Kafka confluence so that I can > work > > on KIP-936. Can someone here do that or do I need to go through Infra? > > > > Claude > > > -- LinkedIn: http://www.linkedin.com/in/claudewarren
Possible bug in Authorize by ResourceTypeQue
*Setup:* Superuser = "User:superman" ACLs added to system new StandardAcl(TOPIC, "foo", PREFIXED, "User:alice", WILDCARD, READ, DENY) new StandardAcl(TOPIC, "foobar", LITERAL, "User:alice", WILDCARD, READ, ALLOW) new StandardAcl(TOPIC, "foo", PREFIXED, "User:bob", WILDCARD, READ, ALLOW) ALLOW_EVERYONE_IF_NO_ACL_IS_FOUND_CONFIG = "true" AuthorizerContext requestContext = MockAuthorizableRequestContext with principal = User:alice host = InetAddress.getLocalHost() *Method Call:* authorizer.authorizeByResourceType(requestContext, READ, TOPIC) *Question:* Should the result be true because there is a LITERAL READ ALLOW on "foobar" or should the result be false because there is an overriding PREFIXED READ DENY on "foo" ? -- LinkedIn: http://www.linkedin.com/in/claudewarren
Re: [DISCUSS] KIP-936: Throttle number of active PIDs
Have you considered using Stable Bloom Filters [1]. I think they do what you want without a lot of the overhead you propose for your solution. In addition, you may want to look at Commons-Collections v4.5 [2] (currently snapshot) for efficient Bloom filter code. I have a Stable Bloom filter implementation based on commons-collections somewhere. [1] Deng, Fan; Rafiei, Davood (2006), "Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters", Proceedings of the ACM SIGMOD Conference (PDF), pp. 25–36 [2] https://github.com/apache/commons-collections/tree/master/src/main/java/org/apache/commons/collections4/bloomfilter
Re: [DISCUSS] KIP-936: Throttle number of active PIDs
The link I thought I included did not carry over in the last post. The paper can be found at: https://webdocs.cs.ualberta.ca/~drafiei/papers/DupDet06Sigmod.pdf On Thu, Jun 8, 2023 at 9:05 AM Claude Warren wrote: > > Have you considered using Stable Bloom Filters [1]. I think they do what > you want without a lot of the overhead you propose for your solution. In > addition, you may want to look at Commons-Collections v4.5 [2] (currently > snapshot) for efficient Bloom filter code. I have a Stable Bloom filter > implementation based on commons-collections somewhere. > > [1] Deng, Fan; Rafiei, Davood (2006), "Approximately Detecting Duplicates > for Streaming Data using Stable Bloom Filters", Proceedings of the ACM > SIGMOD Conference (PDF), pp. 25–36 > > [2] > https://github.com/apache/commons-collections/tree/master/src/main/java/org/apache/commons/collections4/bloomfilter >
Re: [DISCUSS] KIP-936: Throttle number of active PIDs
I have an implementation of a layered Bloom filter in [1] (note the layered branch). This should handle the layering Bloom filter and allow for layers that 1. Do not become over populated and thus yield too many false positives. 2. Expire and are removed automatically. The layered Bloom filter also does not need another thread to remove the old filters as this is amortized across all the inserts. In addition, the number of Layered Bloom filter instances can be reduced by hashing the Kafka Principal and the ID together into the Bloom filter to look for. [1] https://github.com/Claudenw/BloomFilters/tree/layered On Sun, Jun 18, 2023 at 10:18 PM Omnia Ibrahim wrote: > Hi Haruki, Thanks for having a look at the KIP. > > 1. Do you have any memory-footprint estimation for > TimeControlledBloomFilter? > I don't at the moment have any estimate as I don't have a full > implementation of this one at the moment. I can work on one if it's > required. > > > * If I read the KIP correctly, TimeControlledBloomFilter will be > > allocated per KafkaPrincipal so the size should be reasonably small > > considering clusters which have a large number of users. > The Map stored in the cache has 2 dimensions one is vertical which is > KafkaPrincipal (producers only) and the second horizontal which is the time > of the windows. > - Horizontally we will add only PIDs to the TimeControlledBloomFilter only > if KafkaPrincipal didn't hit the quota and we control the bloom filter by > time to expire the oldest set at some point when it's not needed anymore. > - Vertically is the tricky one if the cluster has an insane number of > KafkaPrincipals used for producing. And if the number of KafkaPrincipals is > huge we can control the memory used by the cache by throttling more > aggressively and I would argue that they will never going to be an insane > number that could cause OOM. > > >* i.e. What false-positive rate do you plan to choose as the default? > Am planning on using 0.1 as default. > > > 2. What do you think about rotating windows on produce-requests arrival > instead of scheduler? > > * If we do rotation in scheduler threads, my concern is potential > > scheduler threads occupation which could make other background tasks to > > delay > This is a valid concern. We can consider disposing of the oldest bloom when > we add a new PID to the TimeControlledBloomFilter. However, I would still > need a scheduler to clean up any inactive KafkaPrincipal from the cache > layer `i.e. ProducerIdQuotaManagerCache`. Do you have the same concern > about this one too? > > > 3. Why the default producer.id.quota.window.size.seconds is 1 hour? > > * Unlike other quota types (1 second) > Mostly because 1 sec doesn't make sense for this type of quota. > Misconfigured or misbehaving producers usually don't allocate new PIDs on > the leader every sec but over a period of time. > > Thanks > > On Tue, Jun 6, 2023 at 5:21 PM Haruki Okada wrote: > > > Hi, Omnia. > > > > Thanks for the KIP. > > The feature sounds indeed helpful and the strategy to use bloom-filter > > looks good. > > > > I have three questions: > > > > 1. Do you have any memory-footprint estimation > > for TimeControlledBloomFilter? > > * If I read the KIP correctly, TimeControlledBloomFilter will be > > allocated per KafkaPrincipal so the size should be reasonably small > > considering clusters which have a large number of users. > > * i.e. What false-positive rate do you plan to choose as the default? > > 2. What do you think about rotating windows on produce-requests arrival > > instead of scheduler? > > * If we do rotation in scheduler threads, my concern is potential > > scheduler threads occupation which could make other background tasks to > > delay > > 3. Why the default producer.id.quota.window.size.seconds is 1 hour? > > * Unlike other quota types (1 second) > > > > Thanks, > > > > 2023年6月6日(火) 23:55 Omnia Ibrahim : > > > > > Hi everyone, > > > I want to start the discussion of the KIP-936 to throttle the number of > > > active PIDs per KafkaPrincipal. The proposal is here > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-936%3A+Throttle+number+of+active+PIDs > > > < > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-936%3A+Throttle+number+of+active+PIDs > > > > > > > > > > Thanks for your time and feedback. > > > Omnia > > > > > > > > > -- > > > > Okada Haruki > > ocadar...@gmail.com > > > > > -- LinkedIn: http://www.linkedin.com/in/claudewarren
Re: [DISCUSS] KIP-936: Throttle number of active PIDs
I think that the either using a Stable bloom filter or a Layered bloom filter constructed as follows: - Each layer is configured for the maximum number of principal-PID pairs expected in a single minute. - Expect 60 layers (one for each minute) - If the layer becomes fully populated add another layer. - When the time to insert into the current layer expires, remove the layers that are older than an hour. This will provide a sliding window of one hour with the ability to handle bursts above the expected rate of inserts without additional false positives. By hashing the principal and PID into the filter as a single hash only one Bloom filter is required. The code I pointed to earlier uses the common-collections4 Bloom filter implementation. So a rough pseudo code for the implementation is: Shape shap = Shape.fromNP( 1000, 0.1 ); // 1000 entries, 0.1 false positive rate LayeredBloomFilter filter = LayeredBloomFilter( shape, 60, evictFunc ); // 60 initial layers, eviction function. (on PID) long[2] buff = Murmur3Hash.hash128x64( String.format("%s%s", principal, PID ).getBytes(java.nio.charset.Charset.UTF8)); Hasher hasher = new EnhancedDoubleHasher( buff[0], buff[1] ); if (filter.contains(hasher)) { // handle case where principal-pid was already seen } filter.merge( hasher ); // ensures that principal-pid remains in seen for the next hour. (evict function) The evict function will determine if it has been a minute since the last time we created a new layer, if so create a new layer and evict all layers older than 1 hour. Since the layers are a time ordered list this is simply removing the elderly layers from the front of the list. if it has not been an hour and the current filter is full (e.g. has 1000 entries) create a new layer This should be very fast and space efficient. On Wed, Jun 21, 2023 at 11:13 AM Claude Warren wrote: > I have an implementation of a layered Bloom filter in [1] (note the > layered branch). This should handle the layering Bloom filter and allow > for layers that > >1. Do not become over populated and thus yield too many false >positives. >2. Expire and are removed automatically. > > The layered Bloom filter also does not need another thread to remove the > old filters as this is amortized across all the inserts. > > In addition, the number of Layered Bloom filter instances can be reduced > by hashing the Kafka Principal and the ID together into the Bloom filter to > look for. > > [1] https://github.com/Claudenw/BloomFilters/tree/layered > > On Sun, Jun 18, 2023 at 10:18 PM Omnia Ibrahim > wrote: > >> Hi Haruki, Thanks for having a look at the KIP. >> > 1. Do you have any memory-footprint estimation for >> TimeControlledBloomFilter? >> I don't at the moment have any estimate as I don't have a full >> implementation of this one at the moment. I can work on one if it's >> required. >> >> > * If I read the KIP correctly, TimeControlledBloomFilter will be >> > allocated per KafkaPrincipal so the size should be reasonably small >> > considering clusters which have a large number of users. >> The Map stored in the cache has 2 dimensions one is vertical which is >> KafkaPrincipal (producers only) and the second horizontal which is the >> time >> of the windows. >> - Horizontally we will add only PIDs to the TimeControlledBloomFilter only >> if KafkaPrincipal didn't hit the quota and we control the bloom filter by >> time to expire the oldest set at some point when it's not needed anymore. >> - Vertically is the tricky one if the cluster has an insane number of >> KafkaPrincipals used for producing. And if the number of KafkaPrincipals >> is >> huge we can control the memory used by the cache by throttling more >> aggressively and I would argue that they will never going to be an insane >> number that could cause OOM. >> >> >* i.e. What false-positive rate do you plan to choose as the default? >> Am planning on using 0.1 as default. >> >> > 2. What do you think about rotating windows on produce-requests arrival >> instead of scheduler? >> > * If we do rotation in scheduler threads, my concern is potential >> > scheduler threads occupation which could make other background tasks to >> > delay >> This is a valid concern. We can consider disposing of the oldest bloom >> when >> we add a new PID to the TimeControlledBloomFilter. However, I would still >> need a scheduler to clean up any inactive KafkaPrincipal from the cache >> layer `i.e. ProducerIdQuotaManagerCache`. Do you have the same concern >> about this one too? >> >> > 3. Why the default producer.id.quota.window.size.seconds is
Re: [DISCUSS] KIP-1198: implement a ConfigKey.Builder class
Greetings, I started this topic awhile back and have had no comments so I assume that there is no objection to the proposal. I have created an implementation and updated an earlier PR that requested the KIP process be followed. I would appreciate a review of the PR and/or comments on the KIP. KIP-1198 https://cwiki.apache.org/confluence/display/KAFKA/KIP-1198%3A+implement+a+ConfigKey.Builder+class KAFKA-19381 https://issues.apache.org/jira/browse/KAFKA-19381 PR https://github.com/apache/kafka/pull/19912 Thanks, Claude
Re: [DISCUSS] KIP-936 Throttle number of active PIDs
Igor, thanks for taking the time to look and to review the code. I regret that I have not pushed the latest code, but I will do so and will see what I can do about answering your Bloom filter related questions here. How would an operator know or decide to change the configuration > for the number layers – producer.id.quota.cache.layer.count – > e.g. increasing from 4 to 5; and why? > Do we need a new metric to indicate that change could be useful? In our usage the layered Bloom filter [6] retains the record of a PID for producer.id.quota.window.size.seconds. It breaks that window down into multiple fragments, so 4 layers = 15 minute fragments. It "forgets" a fragment worth of data when the fragment has been around for window.size.seconds. The layers will determine how big a chunk of time is deleted at once. Changing the layers to 10 will yield 6 minute fragments, 60 will yield 1 minute fragments and so on. There are other idiosyncrasies that I will get into later. I would not set the value lower than 3. If you find that there are multiple reports of new PIDs because on average they only ping every 50 minutes it might make sense to use more layers. If you use too many layers then there will only be one PID in each layer, and at that point a simple list of Filters would be faster to search, but in reality does not make sense. If you have two layers then recurring PIDs will be recorded in both layers. Is producer.id.quota.cache.cleanup.scheduler.interval.ms a > guaranteed interval, or rather simply a delay between cleanups? > How did you decide on the default value of 10ms? In the code this is not used. Cleanups are amortized across inserts to keep the layers balanced. There is a thread that does a cleanup every producer.id.quota.window.size.seconds / producer.id.quota.cache.layer.count seconds to detect principals that are no longer sending data. This is a reasonable frequency as it will align well with when the layers actually expire. Under "New ProducerIdQuotaManagerCache", the documentation for > the constructor params for ProducerIDQuotaManagerCache does not > match the constructor signature. Sorry, this is because I did not push the changes. The constructor is ProducerIDQuotaManagerCache(Double falsePositiveRate, long ttl, int layerCount). Where falsePositiveRate is the Bloom filter false positive rate, ttl is producer.id.quota.window.size.seconds in milliseconds, and the layerCount is the desired number of layers. Under "New ProducerIdQuotaManagerCache": > public boolean track(KafkaPrincipal principal, int producerIdRate, long > pid) > How is producerIdRate used? The reference implementation Claude shared > does not use it. > > https://github.com/Claudenw/kafka/blob/49b6eb0fb5cfaf19b072fd87986072a683ab976c/storage/src/main/java/org/apache/kafka/storage/internals/log/ProducerIDQuotaManager.java Again, sorry for not updating the code. The producer rate is used to build a Bloom filter of the proper size. The producer rate is the number of PIDs per hour expected to be created by the principal. The Bloom filter shape [1] is determined by the expected number of PIDs per layer (producerRate * seconds_per_hour / producer.id.quota.window.size.seconds / producer.id.quota.cache.layer.count) and the falsePositiveRate from the constructor. These values are used to call the Shape.fromNP() method. This is the Shape of the Bloom filters in the layer. It is only used when the principal is not found in the cache. Thomas Hurst has provided a web page [5] where you can explore the interaction between number of items and false positive rate. I could not find a description or definition for > TimestampedBloomFilter, could we add that to the KIP? I will add it. It is simply an implementation of WrappedBloomFilter [2] that adds the timestamp for when the filter was created. LayeredBloomFilter will have a fixed size (right?), but some > users (KafkaPrincipal) might only use a small number of PIDs. > It it worth having a dual strategy, where we simply keep a Set of > PIDs until we reach certain size where it pays off to use > the LayeredBloomFilter? Each principal has its own Layered Bloom filter. Here come the idiosyncrasies and benefits of the layered Bloom filter. The layered Bloom filter can be thought of as a collection of bloom filters that are queried to see if the item being searched for (target) has been seen. There are a bunch of ways that the layered filter could be used. You could have a layer for each storage location in a multiple location storage engine for example. But in our case the layer signifies a starting time fragment. That fragment will be at most producer.id.quota.window.size.seconds / producer.id.quota.cache.layer.count seconds long. The earliest layers are at the lower indices, the latest one at the highest. In general, when an item is added to the Layered Bloom filter the following processes take place: - old layers filters are removed using t
Re: [DISCUSS] KIP-936 Throttle number of active PIDs
Quick note: I renamed the example code. It is now at https://github.com/Claudenw/kafka/blob/KIP-936/storage/src/main/java/org/apache/kafka/storage/internals/log/ProducerIDQuotaManagerCache.java On Thu, May 2, 2024 at 10:47 AM Claude Warren, Jr wrote: > Igor, thanks for taking the time to look and to review the code. I > regret that I have not pushed the latest code, but I will do so and will > see what I can do about answering your Bloom filter related questions here. > > How would an operator know or decide to change the configuration >> for the number layers – producer.id.quota.cache.layer.count – >> e.g. increasing from 4 to 5; and why? >> Do we need a new metric to indicate that change could be useful? > > > In our usage the layered Bloom filter [6] retains the record of a PID for > producer.id.quota.window.size.seconds. It breaks that window down into > multiple fragments, so 4 layers = 15 minute fragments. It "forgets" a > fragment worth of data when the fragment has been around for > window.size.seconds. The layers will determine how big a chunk of time is > deleted at once. Changing the layers to 10 will yield 6 minute fragments, > 60 will yield 1 minute fragments and so on. There are other idiosyncrasies > that I will get into later. I would not set the value lower than 3. If > you find that there are multiple reports of new PIDs because on average > they only ping every 50 minutes it might make sense to use more layers. If > you use too many layers then there will only be one PID in each layer, and > at that point a simple list of Filters would be faster to search, but in > reality does not make sense. If you have two layers then recurring PIDs > will be recorded in both layers. > > Is producer.id.quota.cache.cleanup.scheduler.interval.ms a >> guaranteed interval, or rather simply a delay between cleanups? >> How did you decide on the default value of 10ms? > > > In the code this is not used. Cleanups are amortized across inserts to > keep the layers balanced. There is a thread that does a cleanup every > producer.id.quota.window.size.seconds / > producer.id.quota.cache.layer.count seconds to detect principals that are > no longer sending data. This is a reasonable frequency as it will align > well with when the layers actually expire. > > Under "New ProducerIdQuotaManagerCache", the documentation for >> the constructor params for ProducerIDQuotaManagerCache does not >> match the constructor signature. > > > Sorry, this is because I did not push the changes. The constructor is > ProducerIDQuotaManagerCache(Double falsePositiveRate, long ttl, int > layerCount). Where falsePositiveRate is the Bloom filter false positive > rate, ttl is producer.id.quota.window.size.seconds in milliseconds, and the > layerCount is the desired number of layers. > > Under "New ProducerIdQuotaManagerCache": >> public boolean track(KafkaPrincipal principal, int producerIdRate, long >> pid) >> How is producerIdRate used? The reference implementation Claude shared >> does not use it. >> >> https://github.com/Claudenw/kafka/blob/49b6eb0fb5cfaf19b072fd87986072a683ab976c/storage/src/main/java/org/apache/kafka/storage/internals/log/ProducerIDQuotaManager.java > > > Again, sorry for not updating the code. The producer rate is used to > build a Bloom filter of the proper size. The producer rate is the number > of PIDs per hour expected to be created by the principal. The Bloom filter > shape [1] is determined by the expected number of PIDs per layer > (producerRate * seconds_per_hour / producer.id.quota.window.size.seconds / > producer.id.quota.cache.layer.count) and the falsePositiveRate from the > constructor. These values are used to call the Shape.fromNP() method. > This is the Shape of the Bloom filters in the layer. It is only used when > the principal is not found in the cache. Thomas Hurst has provided a web > page [5] where you can explore the interaction between number of items and > false positive rate. > > I could not find a description or definition for >> TimestampedBloomFilter, could we add that to the KIP? > > > I will add it. It is simply an implementation of WrappedBloomFilter [2] > that adds the timestamp for when the filter was created. > > LayeredBloomFilter will have a fixed size (right?), but some >> users (KafkaPrincipal) might only use a small number of PIDs. >> It it worth having a dual strategy, where we simply keep a Set of >> PIDs until we reach certain size where it pays off to use >> the LayeredBloomFilter? > > > Each principal has its own Layered Bloom filter. > > Here come the idiosyncrasies and benefits of the layered Bloom f
Re: [DISCUSS] KIP-936 Throttle number of active PIDs
There is some question about whether or not we need the configuration options. My take on them is as follows: producer.id.quota.window.num No opinion. I don't know what this is used for, but I suspect that there is a good reason to have it. It is not used within the Bloom filter caching mechanism producer.id.quota.window.size.seconds Leave it as it is one of the most effective ways to tune the filter and determines how long a PID is recognized. producer.id.quota.cache.cleanup.scheduler.interval.ms Remove it unless there is another use for it. We can get a better calculation for internals. producer.id.quota.cache.layer.count Leave it as it is one of the most effective ways to tune the filter. producer.id.quota.cache.false.positive.rate Replace it with a constant, I don't think any other Bloom filter solution provides access to this knob for end users. producer_ids_rate Leave this one, it is critical for reasonable operation.
Re: Suggestion about support for wildcard when creating new acls
I think that if this is introduced (and perhaps even if it is not) we need a clear ACL evaluation process. I know we have both allow and deny, and that deny takes precedence over allow. But let's consider two scenarios 1. Unintended access. Let's assume we start with the 6 topics Murali used in his example: nl-accounts-localtopic, nl-accounts-remotetopic, de-accounts-localtopic, de-accounts-remotetopic, cz-accounts-localtopic, and cz-accounts-remotetopic Assume that *-accounts-* pattern granted read access to anyone. I create an account us-accounts-privatetopic and grant explicit access some individuals. Because of the *-accounts-* everyone has access to my privatetopic, and I may not know that there is a leak until it is far too late. I don't have a good way to determine which ACLs will impact my topic. I cannot add a general us-accounts-privatetopic DENY and hope my explicit access works because DENY takes precedence over ALLOW. 2. Unintended/Hostile denial Let's assume we start with the 6 topics Murali used in his example: nl-accounts-localtopic, nl-accounts-remotetopic, de-accounts-localtopic, de-accounts-remotetopic, cz-accounts-localtopic, and cz-accounts-remotetopic Assume that *-accounts-* pattern grants read access to anyone. A bad or carless actor could create *-* DENY which would cause the system to cease functioning as expected. There is not a good way to determine which ACLs impacted the topic. *Note* that both of these issues can occur with the PREFIXED pattern as well, so this is not an argument against the MATCH pattern. There is a fundamental issue with the current ACL implementation as relates to wildcards. I think that the evaluation process should be: 1. Remove patterns that do not match 2. More specific patterns take precedence over less specific patterns 3. for patterns of the same precedence DENY overrides ALLOW *Determining specificity*: Specificity is generally based on the Levenshtein distance between the pattern and the text being evaluated. The lower the distance the more specific the rule. Using the topic name: nl-accounts-localtopic we can evaluate the Levenshtein distance for various patterns. nl-accounts-localtopic = 0 *-accounts-localtopic = 2 nl-accounts-local* = 5 *-accounts-local* = 7 nl-accounts-* = 10 *-accounts-* = 12 In the special case of matching principles User matches are more specific than Group matches. *Usability* With the ACL system becoming a complex web of patterns, it is incumbent upon the development team to provide tools to assist in permissions problem determination. 1. There should be a tool that will provide a list of all ACLs that impact the decision to allow or deny access for a principal to a topic based on principal ID, host, and operation. This will assist operators in rapidly determining the reason for access denied errors. 2. There should be a tool to show the effects of adding an ACL. Using the example from above adding *-accounts-*", should list that nl-accounts-localtopic, nl-accounts-remotetopic, de-accounts-localtopic, de-accounts-remotetopic, cz-accounts-localtopic, and cz-accounts-remotetopic are affected. 3. There should be a tool to show the effects of adding a topic. Using the example from above adding *us-accounts-privatetopic", should list that "*-accounts-*" will influence the permissions calculations for the new topic. *Summary* I leave determining whether or not adding MATCH as a pattern type is a good idea to others with more experience in Kafka. But in either case, I believe that we need to look at how we evaluate ACLs given that we already have a wild card ACL pattern. Claude On Thu, May 2, 2024 at 3:56 PM Murali Basani wrote: > Hello, > > I'd like to propose a suggestion to our resource patterns in Kafka ACLs. > > Currently, when adding new ACLs in Kafka, we have two types of resource > patterns for topics: > >- LITERAL >- PREFIXED > > However, when it comes to listing or removing ACLs, we have a couple more > options: > >- MATCH >- ANY (will match any pattern type) > > > If we can extend creating acls as well with 'MATCH' pattern type, it would > be very beneficial. Even though this kind of acl should be created with > utmost care, it will help organizations streamline their ACL management > processes. > > Example scenarios : > > Let's say we need to create ACLs for the following six topics: > nl-accounts-localtopic, nl-accounts-remotetopic, de-accounts-localtopic, > de-accounts-remotetopic, cz-accounts-localtopic, cz-accounts-remotetopic > > Currently, we achieve this using existing functionality by creating three > prefixed ACLs as shown below: > > kafka-acls --bootstrap-server localhost:9092 \ > > --add \ > > --allow-principal > > > User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown > > \ > > --producer \ > > --topic nl-accounts- \ > > --resource-pattern-type prefixed > > > kafka-acls --bootstrap-server localhost:9092 \ > > --add \ > > --allow
Re: [DISCUSS] KIP-1042 support for wildcard when creating new acls
As I wrote in [1], the ACL evaluation algorithm needs to be specified with respect to the specificity of the pattern so that we know exactly which if *-accounts-* takes precedence over nl-accounts-* or visa versa. I think that we should spell out that precedence is evaluated as follows: 1. Remove patterns that do not match 2. More specific patterns take precedence over less specific patterns 3. for patterns of the same precedence DENY overrides ALLOW Determining specificity: Specificity is based on the Levenshtein distance between the pattern and the text being evaluated. The lower the distance the more specific the rule. Using the topic name: nl-accounts-localtopic we can evaluate the Levenshtein distance for various patterns. nl-accounts-localtopic = 0 *-accounts-localtopic = 2 nl-accounts-local* = 5 *-accounts-local* = 7 nl-accounts-* = 10 *-accounts-* = 12 In the special case of matching principles User matches are more specific than Group matches. I don't know if this should be added to KIP-1042 or presented as a new KIP. Claude [1] https://lists.apache.org/thread/0l88tkbxq3ol9rnx0ljnmswj5y6pho1f <https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls> On Fri, May 3, 2024 at 12:18 PM Claude Warren wrote: > Took me awhile to find it but the link to the KIP is > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls > > On Fri, May 3, 2024 at 10:13 AM Murali Basani > wrote: > > > Hello, > > > > I'd like to propose a suggestion to our resource patterns in Kafka ACLs. > > > > Currently, when adding new ACLs in Kafka, we have two types of resource > > patterns for topics: > > > >- LITERAL > >- PREFIXED > > > > However, when it comes to listing or removing ACLs, we have a couple more > > options: > > > >- MATCH > >- ANY (will match any pattern type) > > > > > > If we can extend creating acls as well with 'MATCH' pattern type, it > would > > be very beneficial. Even though this kind of acl should be created with > > utmost care, it will help organizations streamline their ACL management > > processes. > > > > Example scenarios : > > > > Let's say we need to create ACLs for the following six topics: > > nl-accounts-localtopic, nl-accounts-remotetopic, de-accounts-localtopic, > > de-accounts-remotetopic, cz-accounts-localtopic, cz-accounts-remotetopic > > > > Currently, we achieve this using existing functionality by creating three > > prefixed ACLs as shown below: > > > > kafka-acls --bootstrap-server localhost:9092 \ > > > --add \ > > > --allow-principal > > > > > > User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown > > > \ > > > --producer \ > > > --topic nl-accounts- \ > > > --resource-pattern-type prefixed > > > > > > kafka-acls --bootstrap-server localhost:9092 \ > > > --add \ > > > --allow-principal > > > > > > User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown > > > \ > > > --producer \ > > > --topic de-accounts- \ > > > --resource-pattern-type prefixed > > > > > > kafka-acls --bootstrap-server localhost:9092 \ > > > --add \ > > > --allow-principal > > > > > > User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown > > > \ > > > --producer \ > > > --topic cz-accounts- \ > > > --resource-pattern-type prefixed > > > > > > However, if we had the 'MATCH' pattern type available, we could > accomplish > > this with a single ACL, as illustrated here: > > > > kafka-acls --bootstrap-server localhost:9092 \ > > > --add \ > > > --allow-principal > > > > > > User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown > > > \ > > > --producer \ > > > --topic *-accounts-* \ > > > --resource-pattern-type match > > > > > > > > This pattern closely resembles PREFIXED but offers broader allow/deny > > rules. > > > > Implementing this change could significantly reduce the effort in several > > acl management processes. > > > > I welcome your thoughts and any concerns you may have regarding this > > proposal. > > > > Thanks, > > Murali > > > > > -- > LinkedIn: http://www.linkedin.com/in/claudewarren >
Re: [DISCUSS] KIP-1042 support for wildcard when creating new acls
I have updated KIP-1042 with a proposal for how to reduce the time spent looking for matching wildcard patterns. Experimentally I see a reduction of 66-75% execution time. On Mon, May 6, 2024 at 9:03 PM Greg Harris wrote: > Hi Murali, > > Thanks for the KIP! > > I think I understand the motivation for this KIP in situations where > there are a "cross product" of topics for two or more variables X and > Y, and want to write ACLs for each of the variable axes. > If you format your topics "X-Y-suffix", it's not easy to write rules > that apply to all "Y" topics, because you need to enumerate all of the > "X" values, and the problem persists even if you reorder the topic > name. > > In my recent work on KIP-986 I found it necessary to introduce > "namespaces" to group topics together, and I was going to replicate > the ACL system to specify those namespaces. This change to the ACL > system could increase the expressiveness and complexity of that > feature, if it is ever implemented. > One of the primitives I needed when specifying namespaces was the > ability to tell when two namespaces overlapped (i.e. does there exist > any topic which is present in both namespaces). This is trivial to do > with the current PREFIX and LITERAL system, as we can find the > maximum-length common prefix with just some length comparisons and > equality checks. > I considered specifying namespaces via regular expressions, and found > that it was computationally much more difficult. Computing the > intersection of two regexes appears to be exponential in the length of > the regexes, leading me to avoid adding it. > > I understand that you're not suggesting full REGEX support, and that > "namespaces" don't need to support MATCH, but I think MATCH may run > into related difficulties. Any MATCH can overlap with any other MATCH > or PREFIX if it includes a sufficient number of wildcards. For > example: > MATCH *-accounts-* has overlap with PREFIX nl as they can both match > "nl-accounts-localtopic", but that isn't sensitive to the contents > "nl", it is true for any PREFIX. > MATCH *-accounts-* has overlap with MATCH *localtopic, as they can > both match "nl-accounts-localtopic", but that isn't actually sensitive > to the contents "localtopic", it's true for any MATCH which includes a > wildcard at the beginning. > > This has implications for execution complexity: If we can't compute > whether two patterns overlap, then we need to run both of them on each > piece of input to test if they both match. Under the current > LITERAL/PREFIX system, we can optimize execution with a trie, but that > option wouldn't be available to us with MATCH. > > The current system makes users evaluate a trade-off: > 1. Optimize the number of ACLs by organizing topics according to > prefixes (for example, "accounts-localtopic-nl" and PREFIX "accounts", > PREFIX "accounts-localtopic") > 2. Use less-structured topic names, with a corresponding ACL scheme > that has more individual rules. > The system currently informs users of this tradeoff by making them > write multiple ACLs, and making them think "there has got to be a > better way!". Perhaps we can find a better way to surface this best > practice, or better inform users about it. > > I understand that there are going to be situations more complex than > your example, where multiple individual rules will always be necessary > with only PREFIX evaluation. I think even in those situations, a > number of efficient-to-evaluate rules is preferable to just one > expensive-to-evaluate rule. > > One alternative that I thought of could be "PARAMETERIZED" ACLs which > are like PREFIXED, but allow some parameter substitution. For example > PARAMETERIZED "(nl|de|cz)-accounts-". I'm lifting regex syntax here, > but this isn't actually a regex, and wouldn't allow arbitrary numbers > of characters, or the * or + operators. > In the background it could evaluate exactly like the 3 individual > PREFIX rules, but be easier to evaluate on the backend, and support > the intersection query I mentioned earlier. It could also support > [a-zA-Z] notation in case the parameter values aren't known ahead of > time, but have a fixed length. > > Thanks, > Greg > > On Mon, May 6, 2024 at 11:17 AM Claude Warren wrote: > > > > I have an idea for how to reduce the time for ACL lookups in general and > > particularly where wildcards are involved using sequence > > characterization techniques from bioinformatics. But I need a set of ACL > > patterns and as
Re: [DISCUSS] KIP-1044: A proposal to change idempotent producer -- server implementation
First, a "mea culpa". In the KIP I state that several other KIPs are rejected alternatives. That is not the case, they are addressing different issues and would still work with KIP-1044. I will address this with an update to the KIP. Addressing the case where PID has not been seen before: The KIP adds Bloom filter on disk for each Snapshot file. This is simply to quickly load the Bloom filter at startup and is not read again in the operation once the system is running. The Snapshot file names and Bloom filters are paired and placed in a sorted in memory list with the most recent snapshot listed first: [, ,...] When a PID that has not been seen is presented the following would occur: 1. The cache is searched and fails. 2. The cache tries to find the PID in the snapshot files and fails. 3. the PID is added to the cache. So in step 2 the actions taken are effectively (some hand waving here to gloss over some optimizations) 1. a Bloom filter for the PID is created. 2. Each of the Bloom filters in the list is checked to see if it contains the PID Bloom filter. This operation is extremely fast (on the order of 4 machine instructions per 64 bits). if we have properly configured the Bloom filters there should be a false positive 1 in 1K or 1 in 10K times (This is adjustable). 3. If there is a match on a Bloom filter the Snapshot file is scanned looking for the PID. If found PID is returned. 4. If we get here, none of the filters match, or none of the snapshots associated with matching filters contains the PID, so "no match" is reported. Some thoughts: Using the Commons-Collection bloom filter implementation and a sample of snapshots I should be able to give you some idea of how fast the solution is. However, I have a couple of engagements (Community over Code, EU for one) in the next few weeks that may delay that a bit. I do recognize your point that the scanning of the snapshots may be too slow for the new PID case, but I think with tuning we can adjust that. Claude On Tue, May 21, 2024 at 5:50 PM Justine Olshan wrote: > Can you clarify the intended behavior? If we encounter a producer ID we've > not seen before, we are supposed to read from disk and try to find it? I > see the proposal mentions bloom filters, but it seems like it would not be > cheap to search for the producer ID. I would expect the typical case to be > that there is a new producer and we don't need to search state. > > And we intend to keep all producers we've ever seen on the cluster? I > didn't see a mechanism to delete any of the information in the snapshots. > Currently the snapshot logic is decoupled from the log retention as of > KIP-360. > > Justine > > On Mon, May 20, 2024 at 11:20 PM Claude Warren wrote: > > > The LRU cache is just that: a cache, so yes things expire from the cache > > but they are not gone. As long as a snapshot containing the PID is > > available the PID can be found and reloaded into the cache (which is > > exactly what I would expect it to do). > > > > The question of how long a PID is resolvable then becomes a question of > how > > long are snapshots retained. > > > > There are, in my mind, several advantages: > > > >1. The in-memory cache can be smaller, reducing the memory footprint. > >This is not required but is possible. > >2. PIDs are never discarded because they are produced by slow > >producers. They are discarded when the snapshots containing them > > expire. > >3. The length of time between when a PID is received by the server and > >when it is recorded to a snapshot is significantly reduced. > > Significantly > >reducing the window where PIDs can be lost. > >4. Throttling and other changes you wish to make to the cache are > still > >possible. > > > > > > On Mon, May 20, 2024 at 7:32 PM Justine Olshan > > > > wrote: > > > > > My team has looked at it from a high level, but we haven't had the time > > to > > > come up with a full proposal. > > > > > > I'm not aware if others have worked on it. > > > > > > Justine > > > > > > On Mon, May 20, 2024 at 10:21 AM Omnia Ibrahim < > o.g.h.ibra...@gmail.com> > > > wrote: > > > > > > > Hi Justine are you aware of anyone looking into such new protocol at > > the > > > > moment? > > > > > > > > > On 20 May 2024, at 18:00, Justine Olshan > > > > > > > > > wrote: > > > > > > > > > > I would say I have first hand knowledge of this issue as someone > who > > > > > respon
Re: [DISCUSS] KIP-1044: A proposal to change idempotent producer -- server implementation
Igor, Thanks for the well thought out comment. Do you have a suggestion for a fast way to write to disk? Since the design requires random access perhaps just a random access file? Claude On Thu, May 23, 2024 at 1:17 PM Igor Soarez wrote: > Hi Claude, > > Thanks for writing this KIP. This issue seems particularly > thorny, and I appreciate everyone's effort to address this. > > I want to share my concern with the KIP's proposal of the > use of memory mapped files – mmap is Java's achilles heel, > Kafka should make less use of it, not more. > > The JVM often needs to stop all application threads (aka > mutator threads) before some operations, such as GC, > optimizations, redefinitions, internal cleanups and various > other internal reasons. This is known as Safepointing. > > Because the JVM cannot forcefully stop threads, it must instead > wait for each thread to observe the Safepointing request, > mark themselves as safe and stop. > A single delayed thread can leave the whole JVM hanging, waiting. > > Reads and writes to memory mapped files can trigger system interrupts, > which can block on IO for prolonged amounts of time. > One particualrly bad example is hitting the page cage dirty ratio, > and having to flush all of the page cage, in a potentially large > (high RAM) system into a potentially slow filesystem. > I have seen pauses as extreme as 1 minute, and others have reported > There are other public reports on this. [1][2] > > Safepointing in the JVM is designed with mechanisms to prevent having > to wait for a single busy thread: Threads mark themselves as safe before > waiting on locks, before system calls, before doing JNI, etc, and upon > returning they check if a Safepoint is ongoing. > So if a read or write syscall takes a bit longer that's fine, the JVM > won't halt for Safepointing, it will proceed knowing that any thread stuck > on a syscall will stop if necessary when it returns. > But there's no protection against long system interrups. > From the JVM's perspective the use of mmap is just a simple memory access, > so there's no Safepointing protection around that. > The kernel does know nor care for Java's Safepointing, and does not treat > halting a single unsuspecting thread for a longer period of time with > the severity that it may imply during a JVM Safepoint. > > So for this reason, I urge you to consider alternatives to the use > of memory mapped files. > > Best, > > -- > Igor > > https://groups.google.com/g/mechanical-sympathy/c/LFrJPhyVOJ4 > > https://groups.google.com/g/mechanical-sympathy/c/tepoA7PRFRU/m/7HbSINaFBgAJ > >
Re: [VOTE] KIP-1042: Support for wildcard when creating new acls
I give this a cautious +1 (non binding) as development may yield better head wildcard results. I think the adoption criteria for the ACL search needs to be specified in the KIP. We do not have a good handle on how long the current searches take. If the wildcard tests can be merged into a trie search (see KIP testing section) then the overall speed may be increased. On Mon, Jun 17, 2024 at 10:27 AM Muralidhar Basani wrote: > Hi all, > > I would like to call a vote on KIP-1042 which extends creation of acls with > MATCH pattern type. > > KIP - > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls > > Discussion thread - > https://lists.apache.org/thread/xx3lcg60kp4v34x0j9p6xobby8l4cfq2 > > Thanks, > Murali >
Re: [VOTE] KIP-1042: Support for wildcard when creating new acls
I think that if we put in a trie based system we should be able to halve the normal searhc times and still be able to locate wild card matches very quickly. Users should be warned that "head wildcard" matches are slow and to use them sparingly. I am going to see if I can work out how to do wildcard matches within the trie. But in all cases can show that the trie is faster than the current implementation. Claude On Wed, Jun 19, 2024 at 7:53 PM Muralidhar Basani wrote: > There are some test results mentioned in the Test Plan section of the Kip, > but we need to do more testing with various patterns and permission types. > As mentioned in the discuss thread, the trie implementation could > potentially surpass the current speed of ACL match. > > However, we can only accurately assess the results after updating the > actual classes and analysing them with AuthorizerBenchmark. > > Thanks, > > Murali > > On Mon, 17 Jun 2024 at 20:39, Colin McCabe wrote: > > > My concern is that the extra complexity may actually slow us down. In > > general people already complain about the speed of ACL matches, and > adding > > another "degree of freedom" seems likely to make things worse. > > > > It would be useful to understand how much faster or slower the code is > > with the propsed changes, versus without them. > > > > best, > > Colin > > > > > > On Mon, Jun 17, 2024, at 01:26, Muralidhar Basani wrote: > > > Hi all, > > > > > > I would like to call a vote on KIP-1042 which extends creation of acls > > with > > > MATCH pattern type. > > > > > > KIP - > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls > > > > > > Discussion thread - > > > https://lists.apache.org/thread/xx3lcg60kp4v34x0j9p6xobby8l4cfq2 > > > > > > Thanks, > > > Murali > > >
Re: [VOTE] KIP-1042: Support for wildcard when creating new acls
I have updated the KIP with results from the Trie implementation and they are dramatic to say the least. For most searches they are at least an order of magnitude faster and use less memory. The wildcard search is not a regular expression but rather a file type wild card (*=1 or more character, ?=1 character). Code is available on my personal branch [1]. It is not fully documented and does not meet the checkstyle requirements yet. I also modified the jmh tests to run the Trie tests and limit the test to the single case mentioned in the KIP documentation. If there are no issues with this code, I will complete the documentation and fix the checkstyle and then open a pull request. Claude [1] https://github.com/Claudenw/kafka/pull/new/KIP-1042_Trie_Implementation On Wed, Jul 3, 2024 at 2:21 PM Claude Warren, Jr wrote: > I think that if we put in a trie based system we should be able to halve > the normal searhc times and still be able to locate wild card matches very > quickly. Users should be warned that "head wildcard" matches are slow and > to use them sparingly. I am going to see if I can work out how to do > wildcard matches within the trie. > > But in all cases can show that the trie is faster than the current > implementation. > > Claude > > > > On Wed, Jun 19, 2024 at 7:53 PM Muralidhar Basani > wrote: > >> There are some test results mentioned in the Test Plan section of the Kip, >> but we need to do more testing with various patterns and permission types. >> As mentioned in the discuss thread, the trie implementation could >> potentially surpass the current speed of ACL match. >> >> However, we can only accurately assess the results after updating the >> actual classes and analysing them with AuthorizerBenchmark. >> >> Thanks, >> >> Murali >> >> On Mon, 17 Jun 2024 at 20:39, Colin McCabe wrote: >> >> > My concern is that the extra complexity may actually slow us down. In >> > general people already complain about the speed of ACL matches, and >> adding >> > another "degree of freedom" seems likely to make things worse. >> > >> > It would be useful to understand how much faster or slower the code is >> > with the propsed changes, versus without them. >> > >> > best, >> > Colin >> > >> > >> > On Mon, Jun 17, 2024, at 01:26, Muralidhar Basani wrote: >> > > Hi all, >> > > >> > > I would like to call a vote on KIP-1042 which extends creation of acls >> > with >> > > MATCH pattern type. >> > > >> > > KIP - >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls >> > > >> > > Discussion thread - >> > > https://lists.apache.org/thread/xx3lcg60kp4v34x0j9p6xobby8l4cfq2 >> > > >> > > Thanks, >> > > Murali >> > >> >
Re: [DISCUSS] KIP-1042 support for wildcard when creating new acls
I have updated the KIP with results from the Trie implementation and they are dramatic to say the least. For most searches they are at least an order of magnitude faster and use less memory. The wildcard search is not a regular expression but rather a file type wild card (*=1 or more character, ?=1 character). Code is available on my personal branch [1]. It is not fully documented and does not meet the checkstyle requirements yet. I also modified the jmh tests to run the Trie tests and limit the test to the single case mentioned in the KIP documentation. If there are no issues with this code, I will complete the documentation and fix the checkstyle and then open a pull request. Claude [1] https://github.com/Claudenw/kafka/pull/new/KIP-1042_Trie_Implementation On Mon, Jun 3, 2024 at 9:31 PM Muralidhar Basani wrote: > Hi, > > Just bumping this thread again. It seems no concerns have been raised so > far. > > I'll request votes in 2 weeks. > > Thanks, > Murali > > On Tue, May 28, 2024 at 1:24 PM Muralidhar Basani < > muralidhar.bas...@aiven.io> wrote: > > > Hi all, > > > > Any other suggestions or objections to the proposal? > > > > Thanks, > > Murali > > > > On Thu, May 23, 2024 at 10:18 AM Muralidhar Basani < > > muralidhar.bas...@aiven.io> wrote: > > > >> Thanks Greg. > >> I have updated KIP with details on optimisation of LITERAL too. > >> > >> Regarding limitations in performance by introducing MATCH is definitely > a > >> question. > >> - By optimizing LITERAL and PREFIX we are gaining a few nano secs I > >> think. (described in kip) > >> - MATCH can be introduced only with a configuration. So by default, we > >> can disable the MATCH check (ex : acls.pattern.match.enable=false), and > if > >> customer enables the config, only then add the lookup in the > matchingAcls() > >> - With the proposal already described in KIP for MATCH, we may not have > >> any limitations., rather will be efficient. > >> > >> Maybe we are in good shape with these propositions > >> > >> Thanks, > >> Murali > >> > >> > >> > >> On Tue, May 21, 2024 at 6:15 PM Greg Harris > > >> wrote: > >> > >>> Hi Murali, > >>> > >>> I don't have a trie library in mind. I looked at the current > >>> implementation of the StandardAuthorizer and found that we are already > >>> benefiting from the prefix structure in the implementation [1]. The > >>> current implementation appears to be a TreePSet [2]. > >>> > >>> Now, we've already made this tradeoff once with PREFIX: Prefixes are > >>> less structured than literals, because with literals you can use a > >>> hashing algorithm to jump directly to your relevant ACLs in O(1), but > >>> with a prefix you either need to do multiple lookups, or some sort of > >>> O(log(n)) lookup. And we determined that the ultimate limitation in > >>> performance was worth it for the expressiveness. > >>> We're making this tradeoff again with MATCH acls: wildcards are less > >>> structured than prefixes or literals, because of the reasons I > >>> mentioned earlier. We need to judge now if the ultimate limitation in > >>> performance is worth it. > >>> > >>> I think your strategy for using the optimized layout for prefix and > >>> literal matches is smart, because there does seem to be a gap in > >>> performance possible. It makes me wonder why the optimized layout for > >>> literals was not used when prefixes were added. Literal lookups still > >>> go through the tree lookup, when they could be moved to a hash-lookup > >>> instead. > >>> That would allow users to "choose" for themselves on a convenience vs > >>> performance scale: Smaller use-cases can add a single convenient > >>> MATCH, and larger use-cases can add the multiple optimized PREFIXes. > >>> > >>> [1] > >>> > https://github.com/apache/kafka/blob/9fe3932e5c110443f7fa545fcf0b8f78574f2f73/metadata/src/main/java/org/apache/kafka/metadata/authorizer/StandardAuthorizerData.java#L319-L339 > >>> [2] > >>> > https://github.com/apache/kafka/blob/9fe3932e5c110443f7fa545fcf0b8f78574f2f73/server-common/src/main/java/org/apache/kafka/server/immutable/pcollections/PCollectionsImmutableNavigableSet.java#L34 > >>> > >>> Thanks, > >>> Greg > >>> > >>> On Tue,
Re: [DISCUSS] KIP-1042 support for wildcard when creating new acls
I do not think we need to add the additional MATCH and ALL into the Authorizer. Currently we have - Literal - matches exactly - Prefix - Will match patterns that are longer as long as the part specified matches exactly. The Trie proposal works well with just these two. The wildcard is introduced as a pivot point in the trie, so any wildcard is on a node in the trie by itself. Ignoring the DENY and wildcard processing for a moment, when searching we descend the trie looking for the best matching node (LITERAL or PREFIX) and then check if there are ACLs on that node that match the request. If not we move to the parent node and check there. We continue up the tree until we find a match or reach the root (NO MATCH). For wildcard processing we insert an extra step before each move up to the parent we check for a wildcard match. For the wildcard to match at this point it must be a PREFIX type. By requiring a PREFIX type we can allow "*" and "?" as valid characters in a LITERAL match. DENY processing is performed during descent of the and matches PREFIX when or LITERAL depending on whether or not there is an exact match. Claude On Mon, Jul 29, 2024 at 8:36 AM Claude Warren, Jr wrote: > I have updated the KIP with results from the Trie implementation and they > are dramatic to say the least. For most searches they are at least an > order of magnitude faster and use less memory. The wildcard search is not > a regular expression but rather a file type wild card (*=1 or more > character, ?=1 character). > > Code is available on my personal branch [1]. It is not fully documented > and does not meet the checkstyle requirements yet. I also modified the jmh > tests to run the Trie tests and limit the test to the single case mentioned > in the KIP documentation. > > If there are no issues with this code, I will complete the documentation > and fix the checkstyle and then open a pull request. > > Claude > > [1] https://github.com/Claudenw/kafka/pull/new/KIP > -1042_Trie_Implementation > > On Mon, Jun 3, 2024 at 9:31 PM Muralidhar Basani > wrote: > >> Hi, >> >> Just bumping this thread again. It seems no concerns have been raised so >> far. >> >> I'll request votes in 2 weeks. >> >> Thanks, >> Murali >> >> On Tue, May 28, 2024 at 1:24 PM Muralidhar Basani < >> muralidhar.bas...@aiven.io> wrote: >> >> > Hi all, >> > >> > Any other suggestions or objections to the proposal? >> > >> > Thanks, >> > Murali >> > >> > On Thu, May 23, 2024 at 10:18 AM Muralidhar Basani < >> > muralidhar.bas...@aiven.io> wrote: >> > >> >> Thanks Greg. >> >> I have updated KIP with details on optimisation of LITERAL too. >> >> >> >> Regarding limitations in performance by introducing MATCH is >> definitely a >> >> question. >> >> - By optimizing LITERAL and PREFIX we are gaining a few nano secs I >> >> think. (described in kip) >> >> - MATCH can be introduced only with a configuration. So by default, we >> >> can disable the MATCH check (ex : acls.pattern.match.enable=false), >> and if >> >> customer enables the config, only then add the lookup in the >> matchingAcls() >> >> - With the proposal already described in KIP for MATCH, we may not have >> >> any limitations., rather will be efficient. >> >> >> >> Maybe we are in good shape with these propositions >> >> >> >> Thanks, >> >> Murali >> >> >> >> >> >> >> >> On Tue, May 21, 2024 at 6:15 PM Greg Harris >> >> >> wrote: >> >> >> >>> Hi Murali, >> >>> >> >>> I don't have a trie library in mind. I looked at the current >> >>> implementation of the StandardAuthorizer and found that we are already >> >>> benefiting from the prefix structure in the implementation [1]. The >> >>> current implementation appears to be a TreePSet [2]. >> >>> >> >>> Now, we've already made this tradeoff once with PREFIX: Prefixes are >> >>> less structured than literals, because with literals you can use a >> >>> hashing algorithm to jump directly to your relevant ACLs in O(1), but >> >>> with a prefix you either need to do multiple lookups, or some sort of >> >>> O(log(n)) lookup. And we determined that the ultimate limitation in >> >>> performance was worth it for the expressiveness. >> >>> We're making this t
Re: [DISCUSS] KIP-1042 support for wildcard when creating new acls
> > Proposed Changes > This KIP suggests to support *MATCH* resource pattern type when creating > a new kafka acl. > I do not think we need the MATCH support within the Authorizer as noted in my earlier message. *Main changes include :* > >- Adding support for MATCH when creating new acl in above and below >classes > > Strike this ^. > >- Updating Authorizer > > >- AdminClient changes > > >- Updating cli > > *Detailed changes also include * > >- Modification of the org.apache.kafka.server.authorizer.Authorizer >class to update authorizeByResourceType method > > Authorizer is an interface, while we can update the default implementation the Trie changes have moved the StandardAuthorizer implementation down into the StandardAuthorizerData class. I do not think that there are any changes to this functionality required. What do you think needs to be done? >- Modification of the kafka.security.authorizer.AclAuthorizer class to > > >- update authorizeByResourceType method and other methods > > >- update matchingAcls (this is performance sensitive, as it impacts >latency of every producer and consumer client to get authorization. Verify >AuthorizerBenchmark) > > >- Modification of the kafka.admin.AclCommand class to update multiple >methods like getResourceFilter and objects for parsing arguments >AclCommandOptions > > >- Modification of the kafka.zk.ZkData class to update multiple objects >like ZkAclStore > > Let's not support ZK with this change. > >- Modification of kafka.server.AuthHelper class to update authorize >method > > >- Modification of the org.apache.kafka.jmh.acl.AuthorizerBenchmark >class to update multiple methods like setup and prepareAclCache > > >- Modification of >org.apache.kafka.jmh.acl.StandardAuthorizerUpdateBenchmark class to update >prepareAcls method > > >- Modification of >org.apache.kafka.metadata.authorizer.StandardAuthorizerData class to update >authorize method > > >- Modification of org.apache.kafka.controller.AclControlManager class >to update validateNewAcl method > > >- Updating tests > > >- Similar to prefixed, match ACLs will be stored under the ZK path: >'/kafka-acl-extended/' and change events to such ACLs will be >stored under: '/kafka-acl-extended-changes'. where pattern type will be >prefixed or match > > > *Optimization* for existing LITERAL/PREFIX > >- Currently LITERAL lookups still go through the tree lookup, while >they can be moved to *hash-lookup* > > >- For PREFIX : with a Trie-based solution. Refactor matchingAcls >method in AclAuthorizer.scala, to match LITERAL and PREFIXED acls. > > >- Define the Trie structure > > >- Populate the Trie with ACLs > > >- Retrieve ACLs using the Trie > > With this optimization, we hope to have a drastic reduced latency in the > matchingAcls method, and it's much more efficient. The entire Optimization section can be removed as the Trie based solution solves the LITERLAL, PREFIX and wildcard lookup problems. On Fri, Aug 2, 2024 at 9:55 AM Claude Warren, Jr wrote: > I do not think we need to add the additional MATCH and ALL into the > Authorizer. Currently we have > >- Literal - matches exactly >- Prefix - Will match patterns that are longer as long as the part >specified matches exactly. > > The Trie proposal works well with just these two. The wildcard is > introduced as a pivot point in the trie, so any wildcard is on a node in > the trie by itself. Ignoring the DENY and wildcard processing for a > moment, when searching we descend the trie looking for the best matching > node (LITERAL or PREFIX) and then check if there are ACLs on that node that > match the request. If not we move to the parent node and check there. We > continue up the tree until we find a match or reach the root (NO MATCH). > > For wildcard processing we insert an extra step before each move up to the > parent we check for a wildcard match. For the wildcard to match at this > point it must be a PREFIX type. By requiring a PREFIX type we can allow > "*" and "?" as valid characters in a LITERAL match. > > DENY processing is performed during descent of the and matches PREFIX when > or LITERAL depending on whether or not there is an exact match. > > Claude > > On Mon, Jul 29, 2024 at 8:36 AM Claude Warren, Jr > wrote: > >> I have updated the KIP with results from the Trie implementation and >> they are dramatic to say the least. For mo
Re: [DISCUSS] KIP-1042 support for wildcard when creating new acls
I have created 2 branches in my repository to implement the Trie changes. - https://github.com/Claudenw/kafka/compare/trunk...StandardAuthorizer_refactor refactors the authorizer code to make it easy to implement a different Authorizor data block that plugs into the StandardAuthorizor. - https://github.com/Claudenw/kafka/compare/StandardAuthorizer_refactor...KIP-1042_trie_simplification applies the Trie changes to the authorizer refactor. On Fri, Aug 2, 2024 at 10:07 AM Claude Warren, Jr wrote: > Proposed Changes >> This KIP suggests to support *MATCH* resource pattern type when creating >> a new kafka acl. >> > > I do not think we need the MATCH support within the Authorizer as noted in > my earlier message. > > *Main changes include :* >> >>- Adding support for MATCH when creating new acl in above and below >>classes >> >> > Strike this ^. > >> >>- Updating Authorizer >> >> >>- AdminClient changes >> >> >>- Updating cli >> >> *Detailed changes also include * >> >>- Modification of the org.apache.kafka.server.authorizer.Authorizer >>class to update authorizeByResourceType method >> >> > Authorizer is an interface, while we can update the default > implementation the Trie changes have moved the StandardAuthorizer > implementation down into the StandardAuthorizerData class. I do not think > that there are any changes to this functionality required. What do you > think needs to be done? > > >>- Modification of the kafka.security.authorizer.AclAuthorizer class to >> >> >>- update authorizeByResourceType method and other methods >> >> >>- update matchingAcls (this is performance sensitive, as it impacts >>latency of every producer and consumer client to get authorization. Verify >>AuthorizerBenchmark) >> >> >>- Modification of the kafka.admin.AclCommand class to update multiple >>methods like getResourceFilter and objects for parsing arguments >>AclCommandOptions >> >> >>- Modification of the kafka.zk.ZkData class to update multiple >>objects like ZkAclStore >> >> > Let's not support ZK with this change. > >> >>- Modification of kafka.server.AuthHelper class to update authorize >>method >> >> >>- Modification of the org.apache.kafka.jmh.acl.AuthorizerBenchmark >>class to update multiple methods like setup and prepareAclCache >> >> >>- Modification of >>org.apache.kafka.jmh.acl.StandardAuthorizerUpdateBenchmark class to update >>prepareAcls method >> >> >>- Modification of >>org.apache.kafka.metadata.authorizer.StandardAuthorizerData class to >> update >>authorize method >> >> >>- Modification of org.apache.kafka.controller.AclControlManager class >>to update validateNewAcl method >> >> >>- Updating tests >> >> >>- Similar to prefixed, match ACLs will be stored under the ZK path: >>'/kafka-acl-extended/' and change events to such ACLs will >> be >>stored under: '/kafka-acl-extended-changes'. where pattern type will be >>prefixed or match >> >> >> *Optimization* for existing LITERAL/PREFIX >> >>- Currently LITERAL lookups still go through the tree lookup, while >>they can be moved to *hash-lookup* >> >> >> - For PREFIX : with a Trie-based solution. Refactor matchingAcls >>method in AclAuthorizer.scala, to match LITERAL and PREFIXED acls. >> >> >>- Define the Trie structure >> >> >>- Populate the Trie with ACLs >> >> >>- Retrieve ACLs using the Trie >> >> With this optimization, we hope to have a drastic reduced latency in the >> matchingAcls method, and it's much more efficient. > > The entire Optimization section can be removed as the Trie based solution > solves the LITERLAL, PREFIX and wildcard lookup problems. > > > On Fri, Aug 2, 2024 at 9:55 AM Claude Warren, Jr > wrote: > >> I do not think we need to add the additional MATCH and ALL into the >> Authorizer. Currently we have >> >>- Literal - matches exactly >>- Prefix - Will match patterns that are longer as long as the part >>specified matches exactly. >> >> The Trie proposal works well with just these two. The wildcard is >> introduced as a pivot point in the trie, so any wildcard is on a node in >> the t
[jira] [Created] (KAFKA-17316) Refactor StandardAuthorizer for easier extension
Claude Warren created KAFKA-17316: - Summary: Refactor StandardAuthorizer for easier extension Key: KAFKA-17316 URL: https://issues.apache.org/jira/browse/KAFKA-17316 Project: Kafka Issue Type: Improvement Components: core Affects Versions: 3.8.0 Reporter: Claude Warren Refactor of the Standard Authorizer, associated classes and tests to make it easier to create new AuthorizerData implementations. The goal of this change is to refactor the existing code so that new Implementations of the StandardAuthorizer may be created by creating new implementations of StandardAuthorizerData class. Changes include * creation of an AbstractAuthorizerData class for consistent logging implementation. * Moving MatchingRule interface and implementations from StandardAuthorizerData.java to AuthorizerData.java * Adding a authorizeByResourceType method to AuthorizerData. * Implementing authorizeByResourceType in StandardAuthorizerData by reimplementing the default Authorizer.authorizeByResourceType method. This is a precursor to the Trie implementation described in KIP-1042. -- This message was sent by Atlassian Jira (v8.20.10#820010)
ACL authorization question
If there is an authorizer with no ACLs and authorizeByResourceType(AuthorizableRequestContext requestContext, AclOperation op, ResourceType resourceType) is called with op = UNKNOWN or ANY, or resourceType = UKNOWN or ANY should an IllegalArgumentException be thrown as it is when there are ACLs? I think it should, and I think this is a bug in the default implementation in the Authorizer interface. Thoughts?
Re: ACL authorization question
:DOH: Nevermind. Problem between keyboard and seat. On Thu, Aug 15, 2024 at 8:36 AM Claude Warren, Jr wrote: > If there is an authorizer with no ACLs and > authorizeByResourceType(AuthorizableRequestContext > requestContext, AclOperation op, ResourceType resourceType) is called > with op = UNKNOWN or ANY, or resourceType = UKNOWN or ANY should an > IllegalArgumentException be thrown as it is when there are ACLs? > > I think it should, and I think this is a bug in the default implementation > in the Authorizer interface. > > Thoughts? >
Contribution to RAT
Greetings, I have been working on Apache RAT recently and I noticed that Kafka has a very nice XSLT to convert the Rat output to an HTML document. I know there is not a legal or licensing issue but I am asking if there are any objections to my taking the .gradle/resources/rat-output-to-html.xsl file into the Apache RAT project so that we can have a nice HTML output available to all users? Thanks, Claude
Question about ResourcePatternFilter
Should a ResourcePatternFilter that has a PatternType of ANY and a name of WILDCARD_RESOURCE not match any Acls? I think this is a bug.I am writing a series of tests to ensure that I have implemented everything correctly in the Trie implementation and this has come up. public boolean matches(ResourcePattern pattern) { if (!resourceType.equals(ResourceType.ANY) && !resourceType.equals(pattern.resourceType())) { return false; } if (!patternType.equals(PatternType.ANY) && !patternType.equals(PatternType.MATCH) && !patternType.equals(pattern.patternType())) { return false; } if (name == null) { return true; } if (patternType.equals(PatternType.ANY) || patternType.equals(pattern.patternType())) { return name.equals(pattern.name()); } switch (pattern.patternType()) { case LITERAL: return name.equals(pattern.name()) || pattern.name().equals(WILDCARD_RESOURCE); case PREFIXED: return name.startsWith(pattern.name()); default: throw new IllegalArgumentException("Unsupported PatternType: " + pattern.patternType()); } } The above code is from the ResourcePatternFilter. The first if checks for resource type not matching. the second if checks for pattern type not matching. the third if will return true if the name is null. the fourth if will return true if the name matches in the case where pattern type is ANY or has equality. The only possible conditions here are ANY, MATCH or equality. the switch handles the case where the pattern type does not match.I think this can be simplified as: public boolean matches(ResourcePattern pattern) { if (!resourceType.equals(ResourceType.ANY) && !resourceType.equals(pattern.resourceType())) { return false; } switch (pattern.patternType()) { case UNKNOWN: return patternType.equals(pattern.patternType()); case ANY: case MATCH: return name == null || name.equals(pattern.name()) || pattern.name().equals(WILDCARD_RESOURCE) || name.equals(WILDCARD_RESOURCE); case LITERAL: return patternType.equals(pattern.patternType()) || name == null || name.equals(pattern.name()) || pattern.name().equals(WILDCARD_RESOURCE); case PREFIXED: return patternType.equals(pattern.patternType()) || name == null || name.startsWith(pattern.name()) || pattern.name().equals(WILDCARD_RESOURCE); default: throw new IllegalArgumentException("Unsupported PatternType: " + pattern.patternType()); } } The above case still handles the null name and handles the WILDCARD_RESOURCE as a valid pattern name. I also think it is simpler to read. Am I missing something?
Re: [DISCUSS] KIP-1034: Dead letter queue in Kafka Streams
I am new to the Kafka codebase so please excuse any ignorance on my part. When a dead letter queue is established is there a process to ensure that it at least is defined with the same ACL as the original queue? Without such a guarantee at the start it seems that managing dead letter queues will be fraught with security issues. On Wed, Apr 10, 2024 at 10:34 AM Damien Gasparina wrote: > Hi everyone, > > To continue on our effort to improve Kafka Streams error handling, we > propose a new KIP to add out of the box support for Dead Letter Queue. > The goal of this KIP is to provide a default implementation that > should be suitable for most applications and allow users to override > it if they have specific requirements. > > In order to build a suitable payload, some additional changes are > included in this KIP: > 1. extend the ProcessingContext to hold, when available, the source > node raw key/value byte[] > 2. expose the ProcessingContext to the ProductionExceptionHandler, > it is currently not available in the handle parameters. > > Regarding point 2., to expose the ProcessingContext to the > ProductionExceptionHandler, we considered two choices: > 1. exposing the ProcessingContext as a parameter in the handle() > method. That's the cleanest way IMHO, but we would need to deprecate > the old method. > 2. exposing the ProcessingContext as an attribute in the interface. > This way, no method is deprecated, but we would not be consistent with > the other ExceptionHandler. > > In the KIP, we chose the 1. solution (new handle signature with old > one deprecated), but we could use other opinions on this part. > More information is available directly on the KIP. > > KIP link: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1034%3A+Dead+letter+queue+in+Kafka+Streams > > Feedbacks and suggestions are welcome, > > Cheers, > Damien, Sebastien and Loic >
[jira] [Created] (KAFKA-17423) Replace StandardAuthorizer with Trie implementation
Claude Warren created KAFKA-17423: - Summary: Replace StandardAuthorizer with Trie implementation Key: KAFKA-17423 URL: https://issues.apache.org/jira/browse/KAFKA-17423 Project: Kafka Issue Type: Improvement Components: core Affects Versions: 3.8.0, 0.9.0.2 Reporter: Claude Warren KAFKA-17316 introduces extensible StandardAuthorizer. This change is to provide a Trie based authorizer that extends the StandardAuthorizer. Tests indicate that such an authroizer is 2 orders of magnitude faster than the current authorizer. h2. Trie vs KRAFT Standard Search times h3. Evaluation of Head wildcard I developed some quick tests using random words and creating literal ACLs by combining three words with hyphens. Prefixed ACLs were created by removing the last word from the literal acl. Head wildcard ACLs were created by removing the first word from the literal ACLs and replacing it with an asterisk "*". All literal ACLs were searched for in each test. Timing was recorded in nano seconds and converted to seconds for this table. The results here show that the Trie search beats the Standard search for both literal and prefix searches. In addition the new head wildcard search is approximately as fast as the current literal search. ||Number of Acls||Standard literal||Standard prefix||Trie literal||Trie prefix||Trie head wildcard|| |1000|0.0057 ± 0.0011|0.0032 ± 0.0006|0.0052 ± 0.001|0.0044 ± 0.0016|0.0117 ± 0.0029| |8000|0.0178 ± 0.0011|0.0085 ± 0.0006|0.012 ± 0.0009|0.0076 ± 0.0009|0.0213 ± 0.0023| |27000|0.0614 ± 0.0009|0.0299 ± 0.0005|0.0402 ± 0.0006|0.0254 ± 0.0004|0.0793 ± 0.0053| |64000|0.1625 ± 0.0021|0.0771 ± 0.0022|0.098 ± 0.0026|0.0645 ± 0.0013|0.1794 ± 0.0091| |125000|0.3591 ± 0.0032|0.1632 ± 0.0019|0.1942 ± 0.0037|0.1304 ± 0.0068|0.3484 ± 0.0022| !https://cwiki.apache.org/confluence/download/attachments/303794855/head-tail.png?version=1&modificationDate=1722351326000&api=v2|height=250! h3. JMS Test Suite All tests were run using the standard JMS test suite from the Kafka test library. All values are for runs comprising 50 ACLs with 100K Resources. Each test was executed 15 times and the median score and error calculated. The maximum memory consumption for each test is also presented. Both implementations pass all the Authorizer, and AuthorizerProperty tests. Test were executed on a Thinkpad with Ryzen pro 7 running Ubuntu 22.04.4 LTS with a total of 28544904 Kb memory. The test system was unable to run the Standard tests for 200K resoources as it ran out of memory, though it was able to do so for the Trie tests. Tests do not include any head wildcard tests as they are not supported by Standard implementation. h4. Acls Iterator This test retrieves an iterator over the collection of ACLs that is filtered by an AclBindingFilter. This is a measure of how fast the system can scan all the ACLs looking for specific data. ACLs are not searched for by resource name. |Deny % | |0|20|50|90|99|99.9|99.99|100| |Standard|ms/op|636.370 ± 8.419|744.872 ± 10.324|1168.908 ± 221.970|1790.758 ± 312.487|2039.684 ± 371.359|1915.952 ± 248.867|2094.022 ± 346.507|2154.379 ± 245.848| |total KiB|6,993,926.242|7,315,873.742|9,935,234.141|9,884,250.906|9,867,064.727|9,837,963.148|9,901,205.375|9,863,042.500| |Trie |ms/op|779.097 ± 16.420|931.984 ± 22.113|1218.173 ± 18.023|1571.095 ± 40.815|1603.855 ± 16.527|1659.850 ± 17.646|1688.720 ± 53.368|1720.753 ± 38.237| |total KiB|5,756,430.383|7,048,136.438|7,136,180.156 |8,626,626.211|9,839,865.086|8,495,973.211|9,954,063.266|8,602,073.469| !https://cwiki.apache.org/confluence/download/attachments/303794855/ITER_Ex.png?version=1&modificationDate=1722333121000&api=v2|height=250!!https://cwiki.apache.org/confluence/download/attachments/303794855/ITER_Mem.png?version=1&modificationDate=1722333131000&api=v2|height=250! h4. Authorize by Resource Type This tests a case where we check if the caller is authorized to perform a given operation on at least one resource of the given type. This is a case of looking for resources of a specific type that the principal can access. It is similar to the ACL iterator test but stops on the first approval. |Deny % | |0|20|50|90|99|99.9|99.99|100| |Standard|ms/op| 1186.324 ± 42.475|1360.158 ± 81.720|2004.596 ± 51.584|2411.931 ± 104.194|2718.558 ± 77.745|2627.366 ± 91.740 |2466.940 ± 160.395|2420.297 ± 75.351| |total KiB|6,331,528.313|6,971,241.883|7,622,133.336|9,905,097.813|10,048,529.578|10,122,265.617|9,679,931.570|10,532,133.234| |Trie|ms/op|1.090 ± 0.014 |1.319 ± 0.009|1.787 ± 0.026|2.296 ± 0.049|2.167 ± 0.082|2.340 ± 0.065|2.373 ± 0.072|2.004 ± 0.049| |total KiB|5,862,343.477|7,046,550.586|5,869,397.102|5,872,297.258|7,487,485.984|3,550,240.320|3,23,9351.586|5,416,103.469| !https://cw
[DISCUSS] KAFKA-17423 Replace StandardAuthorizer with Trie implementation
URL: https://issues.apache.org/jira/browse/KAFKA-17423 The above is an improvement to Kafka to replace the sorted list ACL implementation with a Trie based implementation. I have an implementation that passes all the tests, including the new ones in KAFKA-17316 (pull request https://github.com/apache/kafka/pull/16779). Data from the JMS test suite indicates that the Trie implementation is at least an order of magnitude faster than the existing implementation. The change does not modify any API, it simply replaces the StandardAuthorizerData class with a Trie structure based class. Is there any reason not to implement this change? Claude
Re: [VOTE] KIP-1042: Support for wildcard when creating new acls
Colin, Thanks for your insightful comments. I came to the same conclusion. I do have 2 Jira tickets to simplify some of this. 1) KAFKA-17316 <https://issues.apache.org/jira/browse/KAFKA-17316> - Makes developing a new Authorizer by creating a new implementation of the StandardAuthorizerData easier. Basically adds interfaces, abstract classes and lots of tests. 2) KAFKA-17423 <https://issues.apache.org/jira/browse/KAFKA-17423> - Builds on KAFKA-17316 by creating a Trie implementation (without the new wildcards from KIP-1042). This provides an order of magnitude faster processing of ACL requests. My plan is to get these through the process, so I have a better understanding of the Kafka development process, and then rework KIP-1042 to create a new PatternType with wildcard processing. I would appreciate your feedback on the above tickets if you have the time. Claude On Fri, Aug 23, 2024 at 9:43 PM Colin McCabe wrote: > On Sat, Jul 27, 2024, at 04:20, Claude Warren, Jr wrote: > > I have updated the KIP with results from the Trie implementation and they > > are dramatic to say the least. For most searches they are at least an > > order of magnitude faster and use less memory. The wildcard search is > not > > a regular expression but rather a file type wild card (*=1 or more > > character, ?=1 character). > > > > Hi Claude, > > One issue here is that your change is incompatible. For example, if I have > a consumer group named '?', and an ALLOW ACL for '?', this change alters > the meaning of that ACL. Previously it just ALLOWed '?' Now it allows any > single-character group. This obviously could be a major security issue for > people doing upgrades. > > (This wouldn't be an issue if we restricted consumer group names to a > sensible set of characters, like we did with topics. But that ship has > sailed...) > > If you want to add new behavior in a backwards-compatible fashion, your > best bet probably is to create a new PatternType. Unfortunately this makes > the complexity issue worse, but I don't see a way around it. Perhaps we can > deprecate LITERAL and PREFIXED at some future date, if this wildcard thing > makes them unecessary. > > Another issue I see, somewhat related, is that Authorizers are pluggable, > and probably won't be updated all at once to support this new type. That > should be fine, but we should document the error that is returned when > someone tries to create your new-style ACLs with an authorizer that does > not support them. > > > Code is available on my personal branch [1]. It is not fully documented > > and does not meet the checkstyle requirements yet. I also modified the > jmh > > tests to run the Trie tests and limit the test to the single case > mentioned > > in the KIP documentation. > > > > If there are no issues with this code, I will complete the documentation > > and fix the checkstyle and then open a pull request. > > It's fine to open a pull request. You can link it on the KIP page. As > always, we don't want to commit it until the KIP is accepted. > > best, > Colin > > > > > Claude > > > > [1] > https://github.com/Claudenw/kafka/pull/new/KIP-1042_Trie_Implementation > > > > > > On Wed, Jul 3, 2024 at 2:21 PM Claude Warren, Jr > > > wrote: > > > >> I think that if we put in a trie based system we should be able to halve > >> the normal searhc times and still be able to locate wild card matches > very > >> quickly. Users should be warned that "head wildcard" matches are slow > and > >> to use them sparingly. I am going to see if I can work out how to do > >> wildcard matches within the trie. > >> > >> But in all cases can show that the trie is faster than the current > >> implementation. > >> > >> Claude > >> > >> > >> > >> On Wed, Jun 19, 2024 at 7:53 PM Muralidhar Basani > >> wrote: > >> > >>> There are some test results mentioned in the Test Plan section of the > Kip, > >>> but we need to do more testing with various patterns and permission > types. > >>> As mentioned in the discuss thread, the trie implementation could > >>> potentially surpass the current speed of ACL match. > >>> > >>> However, we can only accurately assess the results after updating the > >>> actual classes and analysing them with AuthorizerBenchmark. > >>> > >>> Thanks, > >>> > >>> Murali > >>> > >>> On Mon, 17 Jun 2024 at 20:39, Colin McCabe
Re: [VOTE] KIP-1042: Support for wildcard when creating new acls
I made it easier to replace the existing StandardAuthorizerData with a different implementation in order show the Trie implementation met all the requirements of the StandardAuthorizerData and could be replaced without changing the StandardAuthorizer implementation. Replacing the current StandardAuthorizerData with a Trie implementation makes sense because it is an order of magnitude faster. This means that when we go to implement the Wildcard types we can perform the search in times that are equivalent to the literal search times in the current StandardAuthorizerData implementation. The changes for the first pull request is to create an interface for AuthorizerData and create an "authorizeByResourceType" method within that interface. This adds an initial implementation in StandardAuthorizerData that mirrors the default implementation found in the Authorizer interface. The Trie implementation is not dependent upon other classes local to StandardAuthorizerData other than the MatchingRule and some of its implementations which are now found in the AuthorizerData interface instead of the StandardAuthorizerData implementation. Obviously it is dependent upon AuthorizationResult, AclBinding, KafkaPrincipal and other classes that are defined elsewhere. In addition, abstract test cases are implemented to validate that any replacement implementations yield the same results as the original StandardAuthorizerData. I hope that this aswages any concerns that you may have had, Claude On Thu, Aug 29, 2024 at 6:51 PM Colin McCabe wrote: > On Thu, Aug 29, 2024, at 01:34, Claude Warren, Jr wrote: > > Colin, > > Thanks for your insightful comments. I came to the same conclusion. > > I do have 2 Jira tickets to simplify some of this. > > 1) KAFKA-17316 <https://issues.apache.org/jira/browse/KAFKA-17316> - > Makes developing a new Authorizer by creating a new implementation of the > StandardAuthorizerData easier. Basically adds interfaces, abstract classes > and lots of tests. > > 2) KAFKA-17423 <https://issues.apache.org/jira/browse/KAFKA-17423> - > Builds on KAFKA-17316 by creating a Trie implementation (without the new > wildcards from KIP-1042). This provides an order of magnitude faster > processing of ACL requests. > > My plan is to get these through the process, so I have a better > understanding of the Kafka development process, and then rework KIP-1042 to > create a new PatternType with wildcard processing. > > I would appreciate your feedback on the above tickets if you have the time. > > > Hi Claude, > > Thanks for the response. However, I don't want to fork the > StandardAuthorizer. I would be -1 on that since it would greatly enlarge > our maintenance burden, for no good reason. And if we did fork it, I > wouldn't want the forked version to depend on the internal classes of the > original one, since that would make evolving either one much more difficult. > > We should be able to add wildcards to the existing authorizer without too > much trouble, just by adding a new type alongside LITERAL, PREFIXED. > Perhaps GLOB ? We always planned on adding more types, to cover exactly > this scenario. > > best, > Colin > > Claude > > On Fri, Aug 23, 2024 at 9:43 PM Colin McCabe wrote: > > On Sat, Jul 27, 2024, at 04:20, Claude Warren, Jr wrote: > > I have updated the KIP with results from the Trie implementation and they > > are dramatic to say the least. For most searches they are at least an > > order of magnitude faster and use less memory. The wildcard search is > not > > a regular expression but rather a file type wild card (*=1 or more > > character, ?=1 character). > > > > Hi Claude, > > One issue here is that your change is incompatible. For example, if I have > a consumer group named '?', and an ALLOW ACL for '?', this change alters > the meaning of that ACL. Previously it just ALLOWed '?' Now it allows any > single-character group. This obviously could be a major security issue for > people doing upgrades. > > (This wouldn't be an issue if we restricted consumer group names to a > sensible set of characters, like we did with topics. But that ship has > sailed...) > > If you want to add new behavior in a backwards-compatible fashion, your > best bet probably is to create a new PatternType. Unfortunately this makes > the complexity issue worse, but I don't see a way around it. Perhaps we can > deprecate LITERAL and PREFIXED at some future date, if this wildcard thing > makes them unecessary. > > Another issue I see, somewhat related, is that Authorizers are pluggable, > and probably won't be updated all at once to support this new type. That > should be fine, but we should document the
Re: [VOTE] KIP-1042: Support for wildcard when creating new acls
Colin, I would like to leave the framework that is in KAFKA-17316 as it makes testing the new implementations easier. But let's discuss that elsewhere (I will start another discussion for 17316 and 17423 together). PatternType.GLOB makes sense, I will adjust the KIP to use that terminology. I understand from some of my compatriots that there is a desire to allow GLOBs in resource names, Kafka Principals and Host names. (Basically all the non-enum based ACL data). With the Trie implementation for the resource names we can force the trie to put the '*' and '?' on individual nodes when it encounters them. This makes them a natural pivot point when traversing the trie, It is how I implemented it initially. The idea is that the more "specific" an ACL is the better a match it is assumed to be. So the searching algorithm is start descending the trie until a matching DENY ACL, a matching LITERAL ACL, or leaf node is found. A DENY ACL yields a DENY, a matching LITERAL ACL yields its specified result. A non matching leaf node requires further searching. We start back up the path starting with the leaf node, looking for matching PREFIXED nodes. If a GLOB symbol ('*' or '?') is located as the child of the current node check for a GLOB match. Continue until a PREFIXED ACL match, GLOB ACL Match or the root node is encountered. PEFIXED and GLOB return their specified value, root return a NOT FOUND state. With respect to the Principal and Host names, I have been recently working on Apache Rat moving maven based file exclusion to the core and have found that the Plexus pattern match and selector code is very fast and would probably not be a significant change in performance from straight check to GLOB checking. In these cases we would create what the plexus code calls a "MatchPattern" and for collections of patterns "MatchPatterns". We can then simply check if the requested ACL matches the pattern. The MatchPattern will return the original text so we can perform the standard string match we do today if desired. I think that the KIP will need updating and that this will require changes in the Client, Server and Metadata components. I also believe that we should prohibit the use of "java.util.stream" classes in the metadata/authorizer package. Currently it is only used in a couple of test cases, but the overhead of streams would kill the authorizer performance so best to restrict it before someone brings code. Thoughts? Claude On Fri, Aug 30, 2024 at 4:38 PM Colin McCabe wrote: > Hi Claude, > > I think this is great work. Speeding up the Authorizer will be a big win > for us. > > I don't think we need to add additional interfaces for this, though. Just > get rid of the old slower implementation that I wrote, and replace it with > your newer, faster one. > > Also, I think we should continue the discussion about globs... after all, > it is the topic of this KIP! (You are calling it "wildcards" but I don't > think that is good terminiology since it will cause confusion with the > existing "*" wildcard). Anyway, earlier I requested that you add > PatternType.GLOB and use it for these things. Does that make sense? I don't > see another path to doing it compatibly. I certainly wouldn't want to > create a "Python 2 vs. Python 3" type situation where people get stuck on > an older authorizer fork because the new one requires globs and they can't > handle that. No authorizer forks (or at least, not for things like this.) > > best, > Colin > > > On Fri, Aug 30, 2024, at 01:32, Claude Warren, Jr wrote: > > I made it easier to replace the existing StandardAuthorizerData with a > > different implementation in order show the Trie implementation met all > the > > requirements of the StandardAuthorizerData and could be replaced without > > changing the StandardAuthorizer implementation. > > > > Replacing the current StandardAuthorizerData with a Trie > > implementation makes sense because it is an order of magnitude faster. > > This means that when we go to implement the Wildcard types we can perform > > the search in times that are equivalent to the literal search times in > the > > current StandardAuthorizerData implementation. > > > > The changes for the first pull request is to create an interface for > > AuthorizerData and create an "authorizeByResourceType" method within that > > interface. This adds an initial implementation in StandardAuthorizerData > > that mirrors the default implementation found in the Authorizer > interface. > > > > The Trie implementation is not dependent upon other classes local to > > StandardAuthorizerData other than the MatchingRule and
[DISCUSS] KAFKA-17316 and KAFKA-17423
I have been working on implementing a Trie structure to store ACLs and improve the performance in the metadata/authorization code. The upshot of this was that I found it very difficult to determine if the implementation was correctly reimplementing the current implementation. My goal was to simply change the StandardAuthorizerData implementation and leave the rest of the code as is. However, there were no abstracted tests that would allow me to test this. KAFKA-17316 addresses this issue by creating some internal interfaces for the "metadata/authorizer" package. The one change to the StandardAuthorizer was to implement the"authorizeByResourceType" defined in the "org.apache.kafka.server.authorizer.Authorizer" interface by passing the request down to the AuthorizerData implementation. This change allowed me to create three test implementations. One that implemented "authorizeByResourceType" as it is in the released code base, one that verified that the StandardAuthorizerData implementation did not change the expected results, and one that showed the Trie implementation in KAFKA-17423 was also correct. I think that retaining the work in KAFKA-17316 makes sense as when the next faster implementation comes along we can drop in the replacement and verify that it works correctly. KAFKA-17423 builds on KAFKA-17316 by implementing a Trie based AuthorizerData implementation. By splitting the data into a Trie format the search for matching ACLs is improved by an order of magnitude. The trie implementation allows us to quickly locate the candidate ACLs by splitting them into groups based upon the similarity of the resource name. In addition since we are moving through the trie based on resource name we have several advantages: 1. If we encounter a matching DENY while descending the Trie we can stop as it overrides anything that may be found at lower levels. 2. We only look for LITERAL matches on the descent. If we reach a matching resource name or a leaf node we know there are no LITERAL matches. 3. If we don't have a DENY or a LITERAL match we walk back up the path checking the nodes from the descent looking for a PREFIX match. The advantage here is that we don't have to search again but can simply retrace the path using the Trie structure. I believe that #1 and #2 above account for a significant portion of the speed increase as we do not have to reposition within a sorted list of all ACLs using a binary search. Finally, I think that we should prohibit the use of the java.util.stream classes within the authorizer due to hot path speed considerations. The only existing code that uses streams within that package were test cases. We can prohibit the use by a simple checkstyle prohibition. Doing so will short circuit any misguided potential changes. Thoughs? Claude
[DISCUSS] KIP-1042: Support for GLOBs when creating new acls
After some discussion about the earlier KIP-1042 we have rewritten it to focus on the implementation of a GLOB pattern type. Please review and comment. We have removed all discussion of the Trie implementation and focus on what is required for the GLOB implementation. The KIP does assume that the Trie implementation will be available. Thank you, Claude
Re: [DISCUSS] KAFKA-17316 and KAFKA-17423
Colin, I can see that it makes sense to replace the StandardAuthorizer so that there are not 2 implementations. However, I think the testing framework should remain so that in future new improved implementations of StandardAuthorizerData can be easily implemented and tested. I will put forward a pull request to satisfy KAFKA-17423 that contains only the new implementation. Claude On Mon, Sep 2, 2024 at 9:09 AM Claude Warren, Jr wrote: > I have been working on implementing a Trie structure to store ACLs and > improve the performance in the metadata/authorization code. The upshot of > this was that I found it very difficult to determine if the implementation > was correctly reimplementing the current implementation. > > My goal was to simply change the StandardAuthorizerData implementation and > leave the rest of the code as is. However, there were no abstracted tests > that would allow me to test this. > > KAFKA-17316 addresses this issue by creating some internal interfaces for > the "metadata/authorizer" package. The one change to the > StandardAuthorizer was to implement the"authorizeByResourceType" defined in > the "org.apache.kafka.server.authorizer.Authorizer" interface by passing > the request down to the AuthorizerData implementation. > > This change allowed me to create three test implementations. One that > implemented "authorizeByResourceType" as it is in the released code base, > one that verified that the StandardAuthorizerData implementation did not > change the expected results, and one that showed the Trie implementation in > KAFKA-17423 was also correct. > > I think that retaining the work in KAFKA-17316 makes sense as when the > next faster implementation comes along we can drop in the replacement and > verify that it works correctly. > > KAFKA-17423 builds on KAFKA-17316 by implementing a Trie based > AuthorizerData implementation. By splitting the data into a Trie format > the search for matching ACLs is improved by an order of magnitude. The > trie implementation allows us to quickly locate the candidate ACLs by > splitting them into groups based upon the similarity of the resource name. > In addition since we are moving through the trie based on resource name we > have several advantages: > >1. If we encounter a matching DENY while descending the Trie we can >stop as it overrides anything that may be found at lower levels. >2. We only look for LITERAL matches on the descent. If we reach a >matching resource name or a leaf node we know there are no LITERAL matches. >3. If we don't have a DENY or a LITERAL match we walk back up the path >checking the nodes from the descent looking for a PREFIX match. The >advantage here is that we don't have to search again but can simply retrace >the path using the Trie structure. > > I believe that #1 and #2 above account for a significant portion of the > speed increase as we do not have to reposition within a sorted list of all > ACLs using a binary search. > > Finally, I think that we should prohibit the use of the java.util.stream > classes within the authorizer due to hot path speed considerations. The > only existing code that uses streams within that package were test cases. > We can prohibit the use by a simple checkstyle prohibition. Doing so will > short circuit any misguided potential changes. > > Thoughs? > Claude > >
Re: Possible bug in Authorize by ResourceTypeQue
I am working on a replacement for the StandardAuthorizer and my implementation DENIED while the standard implementation ALLOWED. In reading the specs I thought it should be DENIED. But your statement makes it clear that I misread. Thank you, Claude On Tue, Sep 3, 2024 at 1:14 PM Rajini Sivaram wrote: > Hi Claude, > > `authorizeByResourceType` doesn't grant access to any specific topic, it > grants access to idempotent write if the user has access to write to any > topic (which may or may not exist). In this case, > ALLOW_EVERYONE_IF_NO_ACL_IS_FOUND_CONFIG = "true", so `User:alice` can > write to a topic that doesn't start with `foo` and hence > `authorizeByResourceType` should be ALLOWED. What was the behaviour you > observed? > > Regards, > > Rajini > > > On Tue, Sep 3, 2024 at 12:22 PM Claude Warren wrote: > > > *Setup:* > > Superuser = "User:superman" > > > > ACLs added to system > > new StandardAcl(TOPIC, "foo", PREFIXED, "User:alice", WILDCARD, READ, > DENY) > > new StandardAcl(TOPIC, "foobar", LITERAL, "User:alice", WILDCARD, READ, > > ALLOW) > > new StandardAcl(TOPIC, "foo", PREFIXED, "User:bob", WILDCARD, READ, > ALLOW) > > > > ALLOW_EVERYONE_IF_NO_ACL_IS_FOUND_CONFIG = "true" > > > > AuthorizerContext requestContext = MockAuthorizableRequestContext with > > principal = User:alice > > host = InetAddress.getLocalHost() > > > > > > *Method Call:* > > > > authorizer.authorizeByResourceType(requestContext, READ, TOPIC) > > > > *Question:* > > > > Should the result be true because there is a LITERAL READ ALLOW on > "foobar" > > or should the result be false because there is an overriding PREFIXED > READ > > DENY on "foo" ? > > > > > > > > -- > > LinkedIn: http://www.linkedin.com/in/claudewarren > > >
Re: Possible bug in Authorize by ResourceTypeQue
Followup: If ALLOW_EVERYONE_IF_NO_ACL_IS_FOUND_CONFIG = "true" then authorizeByResourceType should return true in all cases since the user would have access for any operation on any undefined topic? On Tue, Sep 3, 2024 at 2:08 PM Claude Warren, Jr wrote: > I am working on a replacement for the StandardAuthorizer and my > implementation DENIED while the standard implementation ALLOWED. In > reading the specs I thought it should be DENIED. But your statement makes > it clear that I misread. > > Thank you, > Claude > > On Tue, Sep 3, 2024 at 1:14 PM Rajini Sivaram > wrote: > >> Hi Claude, >> >> `authorizeByResourceType` doesn't grant access to any specific topic, it >> grants access to idempotent write if the user has access to write to any >> topic (which may or may not exist). In this case, >> ALLOW_EVERYONE_IF_NO_ACL_IS_FOUND_CONFIG = "true", so `User:alice` can >> write to a topic that doesn't start with `foo` and hence >> `authorizeByResourceType` should be ALLOWED. What was the behaviour you >> observed? >> >> Regards, >> >> Rajini >> >> >> On Tue, Sep 3, 2024 at 12:22 PM Claude Warren wrote: >> >> > *Setup:* >> > Superuser = "User:superman" >> > >> > ACLs added to system >> > new StandardAcl(TOPIC, "foo", PREFIXED, "User:alice", WILDCARD, READ, >> DENY) >> > new StandardAcl(TOPIC, "foobar", LITERAL, "User:alice", WILDCARD, READ, >> > ALLOW) >> > new StandardAcl(TOPIC, "foo", PREFIXED, "User:bob", WILDCARD, READ, >> ALLOW) >> > >> > ALLOW_EVERYONE_IF_NO_ACL_IS_FOUND_CONFIG = "true" >> > >> > AuthorizerContext requestContext = MockAuthorizableRequestContext with >> > principal = User:alice >> > host = InetAddress.getLocalHost() >> > >> > >> > *Method Call:* >> > >> > authorizer.authorizeByResourceType(requestContext, READ, TOPIC) >> > >> > *Question:* >> > >> > Should the result be true because there is a LITERAL READ ALLOW on >> "foobar" >> > or should the result be false because there is an overriding PREFIXED >> READ >> > DENY on "foo" ? >> > >> > >> > >> > -- >> > LinkedIn: http://www.linkedin.com/in/claudewarren >> > >> >
Re: Possible bug in Authorize by ResourceTypeQue
Followup2: your answer speaks directly to "WRITE" access. My example was READ access. So the question method is answering then is: Does the user have access to READ any TOPIC? And that is further restricted by the requestContext host is it not? On Tue, Sep 3, 2024 at 2:10 PM Claude Warren, Jr wrote: > Followup: If ALLOW_EVERYONE_IF_NO_ACL_IS_FOUND_CONFIG = "true" then > authorizeByResourceType should return true in all cases since the user > would have access for any operation on any undefined topic? > > > On Tue, Sep 3, 2024 at 2:08 PM Claude Warren, Jr > wrote: > >> I am working on a replacement for the StandardAuthorizer and my >> implementation DENIED while the standard implementation ALLOWED. In >> reading the specs I thought it should be DENIED. But your statement makes >> it clear that I misread. >> >> Thank you, >> Claude >> >> On Tue, Sep 3, 2024 at 1:14 PM Rajini Sivaram >> wrote: >> >>> Hi Claude, >>> >>> `authorizeByResourceType` doesn't grant access to any specific topic, it >>> grants access to idempotent write if the user has access to write to any >>> topic (which may or may not exist). In this case, >>> ALLOW_EVERYONE_IF_NO_ACL_IS_FOUND_CONFIG = "true", so `User:alice` can >>> write to a topic that doesn't start with `foo` and hence >>> `authorizeByResourceType` should be ALLOWED. What was the behaviour you >>> observed? >>> >>> Regards, >>> >>> Rajini >>> >>> >>> On Tue, Sep 3, 2024 at 12:22 PM Claude Warren wrote: >>> >>> > *Setup:* >>> > Superuser = "User:superman" >>> > >>> > ACLs added to system >>> > new StandardAcl(TOPIC, "foo", PREFIXED, "User:alice", WILDCARD, READ, >>> DENY) >>> > new StandardAcl(TOPIC, "foobar", LITERAL, "User:alice", WILDCARD, READ, >>> > ALLOW) >>> > new StandardAcl(TOPIC, "foo", PREFIXED, "User:bob", WILDCARD, READ, >>> ALLOW) >>> > >>> > ALLOW_EVERYONE_IF_NO_ACL_IS_FOUND_CONFIG = "true" >>> > >>> > AuthorizerContext requestContext = MockAuthorizableRequestContext with >>> > principal = User:alice >>> > host = InetAddress.getLocalHost() >>> > >>> > >>> > *Method Call:* >>> > >>> > authorizer.authorizeByResourceType(requestContext, READ, TOPIC) >>> > >>> > *Question:* >>> > >>> > Should the result be true because there is a LITERAL READ ALLOW on >>> "foobar" >>> > or should the result be false because there is an overriding PREFIXED >>> READ >>> > DENY on "foo" ? >>> > >>> > >>> > >>> > -- >>> > LinkedIn: http://www.linkedin.com/in/claudewarren >>> > >>> >>
Re: Contribution to RAT
The development of RAT is moving along and the format of the XML has changed. We now detect multiple licenses within a single file, so this report will need to change too. I have put it on the list of things to change. It might make it into 0.17 which has massive changes already, mostly around including and excluding files for processing. Claude On Sat, Sep 14, 2024 at 9:48 PM Colin McCabe wrote: > Hi Claude, > > Well, as you said, there is no licensing issue, since they're both Apache > :) > > I don't see why anyone in Apache Kafka would object to moving or > duplicating this file to be within the RAT project. The place to ask is > probably in the RAT project itself -- I don't know if this is something > they'd like to include or not (hopefully yes.) > > cheers, > Colin > > > On Thu, Aug 15, 2024, at 04:54, Claude Warren, Jr wrote: > > Greetings, > > > > I have been working on Apache RAT recently and I noticed that Kafka has a > > very nice XSLT to convert the Rat output to an HTML document. > > > > I know there is not a legal or licensing issue but I am asking if there > are > > any objections to my taking the .gradle/resources/rat-output-to-html.xsl > > file into the Apache RAT project so that we can have a nice HTML output > > available to all users? > > > > Thanks, > > Claude >
[jira] [Created] (KAFKA-14924) Kafka DOAP file has an error
Claude Warren created KAFKA-14924: - Summary: Kafka DOAP file has an error Key: KAFKA-14924 URL: https://issues.apache.org/jira/browse/KAFKA-14924 Project: Kafka Issue Type: Bug Components: documentation Affects Versions: 3.3.2 Reporter: Claude Warren The DOAP file [1] as listed in [2] has the error: [line: 25, col: 6 ] \{E201} The attributes on this property element, are not permitted with any content; expecting end element tag. [1] https://gitbox.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=doap_Kafka.rdf;hb=HEAD [2] https://svn.apache.org/repos/asf/comdev/projects.apache.org/trunk/data/projects.xml -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] Require KIPs to include "How to teach this section"
I like this idea. I'm not sure what the section should be called but It should spell out what changes from a customer (I don't like the term user, drug dealers have users -- we should have customers) point of view and from a developer point of view. I can see cases where the change is not visible to the customer but impacts how developers interact with the system. I can also see cases where the change is almost 100% customer focused with little change to the developers perception. Whatever changes are noted in the section should be accounted for in the PR before it is accepted. Keeping in mind that as the change evolves through testing to documentation requirements may change too. I think we need more focus on documenting how to use and configure kafka in various environments, but I also perceive that we do not have the people to do that, so let's at least collect the information in some reasonable form. Claude On Thu, Oct 31, 2024 at 12:21 PM Anton Agestam wrote: > Thanks for your response here Colin, > > > Perhaps there should be a "documentation" section in the KIP template? > > I think that would do the trick. The nice idea behind formulating the > section as "How to teach this?", is that it leaves it to the KIP author how > to answer it. In most cases I would expect the section to be filled in like > "We will update documentation section X and Y", but there might be cases > where the answer is different (. I'm not going to die on this hill, and > would be very happy with an added "Documentation" section if that option > has more traction 👍 > > > the KIPs themselves are part of the documentation > > I understand this is how things currently work for many parts of Kafka > documentation, but it's an idea I want to question and I am proposing to > work to phase this out over time. KIPs are by definition documenting a > proposed change. It is necessary to make assumptions about the current > state of the Kafka code base at the time of writing, and those assumptions > do not necessarily hold true at any arbitrary time later, when the KIP is > being read. And I think this is what you are saying too, that for instance > an implemented KIP that touches the protocol should also result in changes > to the documentation. > > tl;dr; It's useful to also be able to read KIPs in hind-sight, but it > shouldn't be required to do in-brain materialization of a series of KIPs to > understand what the expected state of some feature currently is. > > What I am hoping with the proposed change to the KIP template is that there > will be less chance for documentation to be an after-thought for new > changes, and also for documentation changes to be scrutinized and reviewed > during the KIP process, and that this will produce higher quality > documentation over time. > > I'm curious if there are more opinions about this issue. > > BR, > Anton > > Den tis 29 okt. 2024 kl 20:50 skrev Colin McCabe : > > > Hi Anton, > > > > Perhaps there should be a "documentation" section in the KIP template? > > That might help raise awareness of the need to document these changes. > > > > I want to add, in the case of the wire protocol, the KIPs themselves are > > part of the documentation. But they shouldn't be all of the > documentation. > > We should update protocol.html and possibly other docs in cases where > we're > > changing the protocol. I don't think it necessary needs to be done before > > the change itself, but it should be done so that the release that > includes > > the protocol changes also includes their docs... > > > > best, > > Colin > > > > > > On Sat, Oct 26, 2024, at 10:53, Anton Agestam wrote: > > > Hello Kafka devs 👋 > > > > > > Colin encouraged me in the 3.9.0 RC2 thread to contribute ideas around > > how > > > the protocol documentation can be improved. While I have concrete ideas > > on > > > this, the current biggest issue as I see it is that new changes are not > > > making it into documentation, and there seems to be a bit of a general > > lack > > > of process with regards to this issue. > > > > > > KIP-893 was a very poignant example of this. It introduces a new > concept > > in > > > the protocol's byte serialization format, none of which made it into > > > documentation. This was extremely subtle and very time consuming to > debug > > > for me as an author of a third-party protocol implementation in Python > > > that must remain compatible with Apache Kafka. Based on the view of the > > > existing documentation, this flat out looked like a bug. > > > > > > The Python ecosystem solves this issue by requiring PEPs to have a "How > > to > > > teach this section", forcing documentation to not be an afterthought. I > > am > > > proposing to introduce the exact same concept for KIPs. I believe this > > will > > > be useful for all KIPs, not just those of the sort mentioned above. > > > > > > For changes to the protocol, I will also suggest that it should be > > required > > > for specification to be updated
Extending SinkConnectorConfig
Greetings, Please excuse any ignorance but my reading of the code indicates that SinkConnectorConfig should be used for the configuration of sink connectors. However, there is no mechanism to add additional items to the configuration as the ConfigDef is created during construction and passed as an argument to ConnectorConfig where it is "enriched". The enriched version makes a copy of the original config. The enriched config is then used to resolve the key in "Object get(String key)" calls. What is the proposed mechanism to extend the SinkConnectorConfig? It almost feels like there should be a ConfigDef builder that is passed into the constructor so that each derived configuration can adds it's components before passing on to the root connector config where the ConfigDef would be built. As a secondary question. When developing connectors it is often desirable to build a sink/source pair to support disaster recovery and other similar operations where you want to recreate the data. This means that two connectors XSink and XSource may need to share common configuration options; call it XConfig. Is there a recommended pattern for adding XConfig options to XSinkConfig and XSourceConfig and ensuring that those options are added to the base SinkConnectorConfig nad SourceConnectorConfig? Any guidance is appreciated, Claude
EmbeddedConnectCluster with multiple Connectors
Greetings, I am looking to run the EmbeddedClusterConnect with 2 different Connectors in an effort to prove round trip processing across a sink and a source connector. However, I don't see any way to create 2 different connectors within the cluster. Am I missing something or has this just not been considered in the past. I can't find any mention of such on the mailing list. If there is a good reason not to run 2 different connectors I would like to hear the reasoning. Otherwise I will propose a change to EmbeddedConnectCluster to handle such a case. Thank you for your time, Claude
Re: [DISCUSS] KIP-1133: AK Documentation and Website in Markdown
I have been working on a similar process for Apache RAT and Commons CLI (help module). The help option in RAT 0.17-SNAPSHOT generates the text for the help options partly from the new commons-cli help module and partly from Enums and similar in the code. For the RAT documentation we produce fragments, for example a markdown fragment that lists every value in an enum as well as its description (a variable in the enum). We then use the Maven site plugin so it can run velocity to combine fragments together using an `insert` statement in a master document. In this way the master markdown says something like # All about X The options for X are: --insert X.txt and the result is the document specifies all the options and their descriptions. Add another enum value and it appears in the documentation. The commons-cli 1.10-SNAPSHOT help system uses the concept of a "scribe" that writes a document in an agnostic way. Basically it has an interface with methods for inserting section titles, paragraphs, lists, and tables. Specific implementations write specific formats. So the same scribe can write markdown, HTML, APT, or any number of formats. I have spent the last week or so working out how to do this for the Aiven Kafka Connectors. The work is still in an early stage but is visible currently at https://claudenw.github.io/cloud-storage-connectors-for-apache-kafka/ The S3 Source -> Configuration link on that page is created by wrapping a markdown file generated by writing the ConfigDef values. There is also a yaml version ( https://claudenw.github.io/cloud-storage-connectors-for-apache-kafka/s3-source-connector/S3SourceConfig.yml) that Aiven's inhouse documentation team is going to use to generate in house documentation. In any case I support this goal and am willing to rework my contributions for RAT, Commons-cli, and Aiven Kafka Connectors in support of this goal, or whatever else needs to be done. Given that the 3 projects mentioned already do something similar, and I did come across another package that did something similar last week, perhaps we should think about a component for Apache Commons that handles generating files from code. Claude On Mon, Feb 24, 2025 at 3:19 PM Mickael Maison wrote: > Hi Harish, > > This is definitively something we want to do! > > MM1. One aspect to keep in mind in terms of compatibility is whether > existing links will still work. Thousands of online resources link to > the Apache Kafka website. Is it possible to keep all links working? > > MM2. A bunch of sections are generated. To do so we have a number of > classes in the public API (for example 0, 1) with toHtml() or main() > methods that output HTML. Should we update these to output Markdown > directly instead? > > Converting the website has been discussed several times, I'd suggest > checking at least https://issues.apache.org/jira/browse/KAFKA-2967 to > see what was previously said. > > 0: > https://kafka.apache.org/39/javadoc/org/apache/kafka/common/config/ConfigDef.html#toHtml() > 1: > https://kafka.apache.org/39/javadoc/org/apache/kafka/clients/consumer/ConsumerConfig.html#main(java.lang.String%5B%5D) > > Thanks, > Mickael > > On Mon, Feb 24, 2025 at 3:33 PM Bruno Cadonna wrote: > > > > Hi Harish, > > > > Thanks for the KIP! Great initiative! > > > > Do you plan to have a development version of the website online in > > parallel to the old HTML version? > > > > I understand that locally with the markdown files and an markdown > > editor, we see the rough structure of the docs (which is already an > > improvement), but we will not see the end product, because for that we > > again need a web server. > > > > I assume that after you migrated all docs to markdown, we want to see > > the complete website, review the content, and make some refinements. > > > > Best, > > Bruno > > > > On 23.02.25 07:52, Harish Vishwanath wrote: > > > Thank you David. Appreciate your feedback. My comments below: > > > > > > DA1) I think this is a very good suggestion. While building the > prototype, > > > I did spend additional effort for older versions of the doc since > there are > > > fairly significant differences in the structure and format of the > documents > > > as compared to 3.x versions. I have written the automation which can do > > > this, but I do think hosting the older versions (<3.x) in its current > > > format and moving the rest to markdown will reduce the overall > > > testing/validation effort. That said, if we foresee making relatively > > > frequent updates to older versions, it might be worth converting > everything > > > to markdown. > > > > > > DA2) My hunch is that this can be quite involved. Rendering a usable > HTML > > > version requires all the supporting artifacts (css, js, images, hugo > > > shortcodes, partials etc.,) which will only be present in the website > > > documentation. Further, the hyper links themselves will not work > without > > > a webserver. To serve a directory structure
[DISCUSS] ConfigDef.ConfigKey builder and addition of deprecated information.
I would like to implement a ConfigDef.ConfigKey builder. The goal is to have a fluent builder that will build the configKey and provide the same defaults that the current constructor set does. In addition, I would like to add the ability to make a ConfigKey as deprecated with some optional information like the version it was deprecated in, a description that can be used to specify the replacement and a flag to indicate that it will be removed soon. My goal is to make it easier to construct the ConfigKey and to make it easier to deprecate options as connectors and similar components evolve. Does anyone have any thoughts on this? Are there any objections? Claude
[jira] [Created] (KAFKA-19381) Add a ConfigDef.ConfigKey builder
Claude Warren created KAFKA-19381: - Summary: Add a ConfigDef.ConfigKey builder Key: KAFKA-19381 URL: https://issues.apache.org/jira/browse/KAFKA-19381 Project: Kafka Issue Type: Improvement Components: clients Affects Versions: 4.0.0 Reporter: Claude Warren The ConfigDef.define() methods increased in number of parameters to the point where it is difficult to maintain. This proposal is to add a ConfigKey.Builder. The builder must: * Be extensible since ConfigKey is. * Be created by calling ConfigKey.builder(name), where name is the name of the config parameter. * Provide reasonable defaults as found in the current ConfigDef.define() methods. * during its build() method call a new constructor in ConfigKey that takes the Builder as an argument -- This message was sent by Atlassian Jira (v8.20.10#820010)
[DISCUSS] KIP-1198: implement a ConfigKey.Builder class
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1198%3A+implement+a+ConfigKey.Builder+class I have a case where I want to develop a ConfigKey that has deprecation data. Having a ConfigKey.Builder that can be extended to would make developing and testing the additional feature much easier. Claude
Re: [DISCUSS] ConfigDef.ConfigKey builder and addition of deprecated information.
Hello Chris, Thanks for the input. I can see the issue with the deprecation field as a potential breaking change to the API but my thought was that it would be used to generate help and log entries if it was present, but I will leave that out for now. I don't see that the builder will create a problem for older versions as it is used to produce the ConfigKeys that are currently produced by the ConfigKeys constructor(s). Since ConfigKeys are not final classes they can be extended and perhaps this is a way forward to verify and work out the deprecation before bringing it into the mainline Kafla code. In my opinion there is no need for a KIP since the builder does not impact any current implementations. Do you concur? Thanks again, Claude On Thu, May 29, 2025 at 12:36 PM Chris Egerton wrote: > Hi Claude, > > As a general improvement to our config API, this sounds fine (I'm perhaps a > little iffy on first-class support for deprecation instead of just adding a > note to the docstring, but that's low-level enough that it can and should > wait for a KIP before being discussed). > > However, if we're talking about doing this for connectors, one of the > challenges that arises is cross-version compatibility between workers and > connectors. Specifically, what happens when a connector built against this > new API is deployed onto a worker running run an older version? > > Cheers, > > Chris > > On Thu, May 29, 2025, 07:31 Claude Warren, Jr > wrote: > > > I would like to implement a ConfigDef.ConfigKey builder. The goal is to > > have a fluent builder that will build the configKey and provide the same > > defaults that the current constructor set does. > > > > In addition, I would like to add the ability to make a ConfigKey as > > deprecated with some optional information like the version it was > > deprecated in, a description that can be used to specify the replacement > > and a flag to indicate that it will be removed soon. > > > > My goal is to make it easier to construct the ConfigKey and to make it > > easier to deprecate options as connectors and similar components evolve. > > > > Does anyone have any thoughts on this? Are there any objections? > > > > Claude > > >
Why did this build fail?
I submitted a PR and the build indicates that the build failed on java 24. All of the tests passed but gradle seems to be reporting that it failed. Can someone point me to the reason why? https://github.com/apache/kafka/actions/runs/17437178638/job/49511325798 Thx, Claude
ConfigDef.ConfigKey.dependents question.
I see from the javadocs that "ConfigKey.dependents" specifies the dependents of a configuration, but no indication of what that means. I assume that it is an abstract sort of idea that is intended to indicate that the use of property "B" is dependent upon the presence of property "A". In which case "A" would list "B" as a dependent. If that is correct, is there a way to locate all the properties that a property is dependent upon? So given "B" can I find "A"? Claude