Hey Everyone, Tried out a new format to get some attention and also to make understanding easier, so I recorded a 15 min long video about this KIP. https://www.youtube.com/watch?v=uOJTyAEJmB8&feature=youtu.be
Sorry for the sound quality but recording a video isn't a thing for me and also I look like someone who is being interrogated but again, I don't often do this. :) If there are no further objections I will start a voting in a few days. Viktor On Tue, Feb 2, 2021 at 1:54 PM Viktor Somogyi-Vass <viktorsomo...@gmail.com> wrote: > Hi all, > > I have updated the interfaces. I managed to shrink the required number of > entities. Basically I store the event type with the event, therefore we can > cover all topic related events (create, delete, change) with one event type. > > I think if on-one has objections then I'll start a vote soon. > > Viktor > > On Thu, Oct 29, 2020 at 5:15 PM Viktor Somogyi-Vass < > viktorsomo...@gmail.com> wrote: > >> Hi Tom. >> >> Sorry for the delay. >> Answering your points: >> >> > Why is it necessary to introduce this interface to produce the audit >> trail >> > when there is logging that can already record a lot of the same >> > information, albeit in less structured form? If logging isn't adequate >> it >> > would be good to explain why not in the Motivation or Rejected >> Alternatives >> > section. It's worth pointing out that even the "less structured" part >> would >> > be helped by KIP-673, which proposes to change the RequestChannel's >> logging >> > to include a JSON representation of the request. >> >> We will need authorization details as would an auditor normally have them >> but a request logger doesn't as you correctly pointed out later in your >> reply. They would also appear at different lifecycle points I imagine, like >> the request logger is probably when the request enters Kafka and the >> auditor catches them before sending the response, so it can obtain all >> information (authorization, execution). >> Furthermore this auditing API would specifically target other JVM based >> components that depend on Kafka (like Ranger or Atlas) and from both side's >> perspective it's much better to expose Java level classes rather than a >> lower level (JSON) implementation. If a Java level object is exposed then >> we need to create them once during request processing which is fairly >> low-fat since we're parsing the request most of the time anyways as opposed >> to JSON which would need to be serialized first and then deserialized for >> the consumer of the API. >> >> > I'm guessing what you gain from the proposed interface is the fact that >> > it's called after the authorizer (perhaps after the request has been >> > handled: I'm unclear about the purpose of AuditInfo.error), so you could >> > generate a single record in the audit trail. That could still be >> achieved >> > using logging, either by correlating existing log messages or by >> proposing >> > some new logging just for this auditing purpose (perhaps with a logger >> per >> > API key so people could avoid the performance hit on the produce and >> fetch >> > paths if they weren't interested in auditing those things). Again, if >> this >> > doesn't work it would be great for the KIP to explain why. >> >> AuditInfo.error serves for capturing the possible errors that could >> happen during the authorization and execution of the request. For instance >> a partition creation request could be authorized and then rejected >> with INVALID_TOPIC_EXCEPTION because the topic is queued for deletion. In >> this case the AuditInfo.error would contain this API error thus emitting >> information about the failure of the request. With normal auditing that >> looks at only the authorization information we wouldn't detect it. >> Regarding the produce and fetch performance: for these kinds of requests >> I don't think we should enable parsing the batches themselves, just only >> pass some meta information like which topics and partitions are affected. >> These are parsed anyways for log reading and writing. Also similarly to the >> authorizer we need to require implementations to run the auditing logic on >> a different thread to minimize the performance impact. >> >> > To me there were parallels with previous discussions about broker-side >> > interceptors ( >> > https://www.mail-archive.com/dev@kafka.apache.org/msg103310.html if >> you've >> > not seen it before), those seemed to founder on the unwillingness to >> make >> > the request internal classes into a supported API. You've tried to >> address >> > this by proposing a parallel set of classes implementing AuditEvent for >> > exposing selective details about the request. It's not really clear that >> > you really _need_ to access all that information about each request, >> rather >> > than simply recording it all, and it would also come with a significant >> > implementation and maintenance cost. If it's simply about recording all >> the >> > information in the request, then it would likely be enough to pass a >> > suitably formatted String rather than an AuditEvent, which basically >> brings >> > us back to point 1, but with some justification for not using logging. >> >> Thanks for this email thread, I haven't seen it but now I see it's a much >> bigger tree that I'm chopping :). But the point is that everyone basically >> faces a similar issue, that we need server side interceptors and >> observables. Indeed auditing can be part of such an interceptor chain and >> I've been thinking of it like this too sometimes but as it has been >> correctly assessed in the thread "we're doing the one-offs". I also admit >> that maintaining all the implementation of AuditEvent could be cumbersome >> and maybe this isn't a way. However I think we should expose more >> structured forms. If we maintain suitably formatted Strings and if the >> protocol changes for some requests it could be much harder to trace the >> needed changes back to these Strings. >> One idea that I had while reading the email is we could generate these >> classes similarly to the *Data classes (like CreateTopicsData). There would >> be a flag called "useForAuditing=true" in the JSON definition of the >> protocol that would cause the given field to be generated into a class that >> would be the implementation of AuditEvent and would be a public API. It >> would be instantiated when a request is deserialized. In my opinion it has >> the advantage that it's tightly coupled with the protocol from the >> maintenance point of view and still provides an efficient and structured >> way of accessing certain information of the request. >> >> Best, >> Viktor >> >> On Thu, Oct 1, 2020 at 4:16 PM Tom Bentley <tbent...@redhat.com> wrote: >> >>> Hi Viktor, >>> >>> Like Mickael, I can see that there's value in having an audit trail. For >>> me >>> the KIP raises a number of questions in its current form: >>> >>> Why is it necessary to introduce this interface to produce the audit >>> trail >>> when there is logging that can already record a lot of the same >>> information, albeit in less structured form? If logging isn't adequate it >>> would be good to explain why not in the Motivation or Rejected >>> Alternatives >>> section. It's worth pointing out that even the "less structured" part >>> would >>> be helped by KIP-673, which proposes to change the RequestChannel's >>> logging >>> to include a JSON representation of the request. >>> >>> I'm guessing what you gain from the proposed interface is the fact that >>> it's called after the authorizer (perhaps after the request has been >>> handled: I'm unclear about the purpose of AuditInfo.error), so you could >>> generate a single record in the audit trail. That could still be achieved >>> using logging, either by correlating existing log messages or by >>> proposing >>> some new logging just for this auditing purpose (perhaps with a logger >>> per >>> API key so people could avoid the performance hit on the produce and >>> fetch >>> paths if they weren't interested in auditing those things). Again, if >>> this >>> doesn't work it would be great for the KIP to explain why. >>> >>> To me there were parallels with previous discussions about broker-side >>> interceptors ( >>> https://www.mail-archive.com/dev@kafka.apache.org/msg103310.html if >>> you've >>> not seen it before), those seemed to founder on the unwillingness to make >>> the request internal classes into a supported API. You've tried to >>> address >>> this by proposing a parallel set of classes implementing AuditEvent for >>> exposing selective details about the request. It's not really clear that >>> you really _need_ to access all that information about each request, >>> rather >>> than simply recording it all, and it would also come with a significant >>> implementation and maintenance cost. If it's simply about recording all >>> the >>> information in the request, then it would likely be enough to pass a >>> suitably formatted String rather than an AuditEvent, which basically >>> brings >>> us back to point 1, but with some justification for not using logging. >>> >>> Kind regards, >>> >>> Tom >>> >>> On Thu, Oct 1, 2020 at 11:30 AM Dániel Urbán <urb.dani...@gmail.com> >>> wrote: >>> >>> > Hi Viktor, >>> > >>> > I think the current state of the proposal is flexible enough to support >>> > use-cases where the response data is of interest to the auditor. >>> > This part ensures that: "... doing the auditing before sending the >>> response >>> > back ...". Additionally, event classes could be extended with >>> additional >>> > data if needed. >>> > >>> > Overall, the KIP looks good, thanks! >>> > >>> > Daniel >>> > >>> > Viktor Somogyi-Vass <viktorsomo...@gmail.com> ezt írta (időpont: 2020. >>> > szept. 30., Sze, 17:24): >>> > >>> > > Hi Daniel, >>> > > >>> > > I think in this sense we can use the precedence set with the >>> > > KAfkaAdminClient. It has *Result and *Options classes which in this >>> > > interpretation are similar in versioning and usage as they transform >>> and >>> > > convey the responses of the protocol in a minimalistic API. >>> > > I've modified the KIP a bit and created some examples for these event >>> > > classes. For now as the implementation I think we can treat this >>> > similarly >>> > > to KIP-4 (AdminClient) which didn't push implementation for >>> everything >>> > but >>> > > rather pushed implementing everything to subsequent KIPs as the >>> > > requirements become important. In this first KIP we can create the >>> more >>> > > important ones (listed in the "Default Implementation") section if >>> that >>> > is >>> > > fine. >>> > > >>> > > Regarding the response passing: to be honest I feel like that it's >>> not >>> > that >>> > > strictly related to auditing but I think it's a good idea and could >>> fit >>> > > into this API. I think that we should design this current API with >>> this >>> > in >>> > > mind. Did you have any specific ideas about the implementation? >>> > > >>> > > Viktor >>> > > >>> > > On Tue, Sep 22, 2020 at 9:05 AM Dániel Urbán <urb.dani...@gmail.com> >>> > > wrote: >>> > > >>> > > > An example I had in mind was the ProduceResponse - the auditor >>> might >>> > need >>> > > > access to the new end offset of the partitions. >>> > > > The event-based approach sounds good - new events and fields can be >>> > added >>> > > > on-demand. Do we need the same versioning strategy we use with the >>> > > > requests/responses? >>> > > > >>> > > > Daniel >>> > > > >>> > > > Viktor Somogyi-Vass <viktorsomo...@gmail.com> ezt írta (időpont: >>> 2020. >>> > > > szept. 21., H, 14:08): >>> > > > >>> > > > > Hi Daniel, >>> > > > > >>> > > > > > If the auditor needs access to the details of the action, one >>> could >>> > > > argue >>> > > > > that even the response should be passed down to the auditor. >>> > > > > At this point I don't think we need to include responses into the >>> > > > interface >>> > > > > but if you have a use-case we can consider doing that. >>> > > > > >>> > > > > > Is it feasible to convert the Java requests and responses to >>> public >>> > > > API? >>> > > > > Well I think that in this case we would need to actually >>> transform a >>> > > lot >>> > > > of >>> > > > > classes and that might be a bit too invasive. Although since the >>> > > protocol >>> > > > > itself *is* a public API it might make sense to have some kind of >>> > Java >>> > > > > representation as a public API as well. >>> > > > > >>> > > > > > If not, do we have another option to access this info in the >>> > auditor? >>> > > > > I think one option would be to do what the original KIP-567 was >>> > > > > implemented. Basically we could have an AuditEvent interface that >>> > would >>> > > > > contain request specific data. Its obvious drawback is that it >>> has to >>> > > be >>> > > > > implemented for most of the 40 something protocols but on the >>> upside >>> > > > these >>> > > > > classes shouldn't be complicated. I can try to do a PoC with >>> this to >>> > > see >>> > > > > how it looks like and whether it solves the problem. To be >>> honest I >>> > > think >>> > > > > it would be better than publishing the request classes as an API >>> > > because >>> > > > > here we're restricting access to only what is necessary. >>> > > > > >>> > > > > Thanks, >>> > > > > Viktor >>> > > > > >>> > > > > >>> > > > > >>> > > > > On Fri, Sep 18, 2020 at 8:37 AM Dániel Urbán < >>> urb.dani...@gmail.com> >>> > > > > wrote: >>> > > > > >>> > > > > > Hi, >>> > > > > > >>> > > > > > Thanks for the KIP. >>> > > > > > >>> > > > > > If the auditor needs access to the details of the action, one >>> could >>> > > > argue >>> > > > > > that even the response should be passed down to the auditor. >>> > > > > > Is it feasible to convert the Java requests and responses to >>> public >>> > > > API? >>> > > > > > If not, do we have another option to access this info in the >>> > auditor? >>> > > > > > I know that the auditor could just send proper requests >>> through the >>> > > API >>> > > > > to >>> > > > > > the brokers, but that seems like an awful lot of overhead, and >>> > could >>> > > > > > introduce timing issues as well. >>> > > > > > >>> > > > > > Daniel >>> > > > > > >>> > > > > > >>> > > > > > Viktor Somogyi-Vass <viktorsomo...@gmail.com> ezt írta >>> (időpont: >>> > > 2020. >>> > > > > > szept. 16., Sze, 17:17): >>> > > > > > >>> > > > > > > One more after-thought on your second point >>> (AbstractRequest): >>> > the >>> > > > > > reason I >>> > > > > > > introduced it in the first place was that this way >>> implementers >>> > can >>> > > > > > access >>> > > > > > > request data. A use case can be if they want to audit a >>> change in >>> > > > > > > configuration or client quotas but not just acknowledge the >>> fact >>> > > that >>> > > > > > such >>> > > > > > > an event happened but also capture the change itself by >>> peeking >>> > > into >>> > > > > the >>> > > > > > > request. Sometimes it can be useful especially when people >>> want >>> > to >>> > > > > trace >>> > > > > > > back the order of events and what happened when and not just >>> > > > > acknowledge >>> > > > > > > that there was an event of a certain kind. I also recognize >>> that >>> > > this >>> > > > > > might >>> > > > > > > be a very loose interpretation of auditing as it's not >>> strictly >>> > > > related >>> > > > > > to >>> > > > > > > authorization but rather a way of tracing the admin actions >>> > within >>> > > > the >>> > > > > > > cluster. It even could be a different API therefore but >>> because >>> > of >>> > > > the >>> > > > > > > variety of the Kafka APIs it's very hard to give a method >>> that >>> > fits >>> > > > > all, >>> > > > > > so >>> > > > > > > it's easier to pass down the AbstractRequest and the >>> > implementation >>> > > > can >>> > > > > > do >>> > > > > > > the extraction of valuable info. So that's why I added this >>> in >>> > the >>> > > > > first >>> > > > > > > place and I'm interested in your thoughts. >>> > > > > > > >>> > > > > > > On Wed, Sep 16, 2020 at 4:41 PM Viktor Somogyi-Vass < >>> > > > > > > viktorsomo...@gmail.com> >>> > > > > > > wrote: >>> > > > > > > >>> > > > > > > > Hi Mickael, >>> > > > > > > > >>> > > > > > > > Thanks for reviewing the KIP. >>> > > > > > > > >>> > > > > > > > 1.) I just wanted to follow the conventions used with the >>> > > > Authorizer >>> > > > > as >>> > > > > > > it >>> > > > > > > > is built in a similar fashion, although it's true that in >>> > > > KafkaServer >>> > > > > > we >>> > > > > > > > call the configure() method and the start() in the next >>> line. >>> > > This >>> > > > > > would >>> > > > > > > be >>> > > > > > > > the same in Auditor and even simpler as there aren't any >>> > > parameters >>> > > > > to >>> > > > > > > > start(), so I can remove it. If it turns out there is a >>> need >>> > for >>> > > > it, >>> > > > > we >>> > > > > > > can >>> > > > > > > > add it later. >>> > > > > > > > >>> > > > > > > > 2.) Yes, this is a very good point, I will remove it, >>> however >>> > in >>> > > > this >>> > > > > > > case >>> > > > > > > > I don't think we need to add the ApiKey as it is already >>> > > available >>> > > > in >>> > > > > > > > AuthorizableRequestContext.requestType(). One less >>> parameter >>> > :). >>> > > > > > > > >>> > > > > > > > 3.) I'll add it. It will simply log important changes in >>> the >>> > > > cluster >>> > > > > > like >>> > > > > > > > topic events (create, update, delete, partition or >>> replication >>> > > > factor >>> > > > > > > > change), ACL events, config changes, reassignment, >>> altering log >>> > > > dirs, >>> > > > > > > > offset delete, group delete with the authorization info >>> like >>> > who >>> > > > > > > initiated >>> > > > > > > > the call, was it authorized, were there any errors. Let me >>> know >>> > > if >>> > > > > you >>> > > > > > > > think there are other APIs I should include. >>> > > > > > > > >>> > > > > > > > 4.) The builder is there mostly for easier usability but >>> > actually >>> > > > > > > thinking >>> > > > > > > > of it it doesn't help much so I removed it. The AuditInfo >>> is >>> > > also a >>> > > > > > > helper >>> > > > > > > > class so I don't see any value in transforming it into an >>> > > interface >>> > > > > but >>> > > > > > > if >>> > > > > > > > I simplify it (by removing the builder) it will be cleaner. >>> > Would >>> > > > > that >>> > > > > > > work? >>> > > > > > > > >>> > > > > > > > I'll update the KIP to reflect my answers. >>> > > > > > > > >>> > > > > > > > Viktor >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > On Mon, Sep 14, 2020 at 6:02 PM Mickael Maison < >>> > > > > > mickael.mai...@gmail.com >>> > > > > > > > >>> > > > > > > > wrote: >>> > > > > > > > >>> > > > > > > >> Hi Viktor, >>> > > > > > > >> >>> > > > > > > >> Thanks for restarting the discussion on this KIP. Being >>> able >>> > to >>> > > > > easily >>> > > > > > > >> audit usage of a Kafka cluster is a very valuable feature. >>> > > > > > > >> >>> > > > > > > >> Regarding the API, I have a few of questions: >>> > > > > > > >> 1) You introduced a start() method. I don't think any >>> other >>> > > > > interfaces >>> > > > > > > >> have such a method. Users can do any setup they want in >>> > > > configure() >>> > > > > > > >> >>> > > > > > > >> 2) The first argument of audit is an AbstractRequest. >>> > > > Unfortunately >>> > > > > > > >> this type is not part of the public API. But actually I'm >>> not >>> > > sure >>> > > > > > > >> having the full request is really needed here. Maybe just >>> > > passing >>> > > > > the >>> > > > > > > >> Apikey would be enough as we already have all the >>> resources >>> > from >>> > > > the >>> > > > > > > >> auditInfos field. >>> > > > > > > >> >>> > > > > > > >> 3) The KIP mentions a "LoggingAuditor" default >>> implementation. >>> > > > What >>> > > > > is >>> > > > > > > >> it doing? Can you add more details about it? >>> > > > > > > >> >>> > > > > > > >> 4) Can fields of AuditInfo be null? I can see there's a >>> > > > constructor >>> > > > > > > >> without an Errors and that sets the error field to None. >>> > > However, >>> > > > > with >>> > > > > > > >> the builder pattern, if error is not set it's null. >>> > > > > > > >> >>> > > > > > > >> 5) Should AuditInfo be an interface? >>> > > > > > > >> >>> > > > > > > >> On Mon, Sep 14, 2020 at 3:26 PM Viktor Somogyi-Vass >>> > > > > > > >> <viktorsomo...@gmail.com> wrote: >>> > > > > > > >> > >>> > > > > > > >> > Hi everyone, >>> > > > > > > >> > >>> > > > > > > >> > Changed the interface a little bit to accommodate >>> methods >>> > > better >>> > > > > > where >>> > > > > > > >> > authorization happens for multiple operations so the >>> > > implementer >>> > > > > of >>> > > > > > > the >>> > > > > > > >> > audit interface will receive all authorizations >>> together. >>> > > > > > > >> > I'll wait a few more days to allow people to react or >>> give >>> > > > > feedback >>> > > > > > > but >>> > > > > > > >> if >>> > > > > > > >> > there are no objections until then, I'll start a vote. >>> > > > > > > >> > >>> > > > > > > >> > Viktor >>> > > > > > > >> > >>> > > > > > > >> > On Tue, Sep 8, 2020 at 9:49 AM Viktor Somogyi-Vass < >>> > > > > > > >> viktorsomo...@gmail.com> >>> > > > > > > >> > wrote: >>> > > > > > > >> > >>> > > > > > > >> > > Hi Everyone, >>> > > > > > > >> > > >>> > > > > > > >> > > I'd like to restart the discussion on this. Since the >>> KIP >>> > > has >>> > > > > been >>> > > > > > > >> > > revamped I thought I'd start a new discussion thread. >>> > > > > > > >> > > >>> > > > > > > >> > > Link: >>> > > > > > > >> > > >>> > > > > > > >> >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-567%3A+Kafka+Cluster+Audit >>> > > > > > > >> > > >>> > > > > > > >> > > Short summary: >>> > > > > > > >> > > - Would like to introduce a new interface similar to >>> the >>> > > > > > Authorizer >>> > > > > > > >> called >>> > > > > > > >> > > Auditor as follows: >>> > > > > > > >> > > public interface Auditor { >>> > > > > > > >> > > audit(Request r, AuthorizableRequestContext c, >>> > > > > > AclOperation >>> > > > > > > >> > > o, Map<ResourcePattern, Boolean> isAllowed, >>> > > > Map<ResourcePattern, >>> > > > > > > >> Errors> >>> > > > > > > >> > > errors); >>> > > > > > > >> > > } >>> > > > > > > >> > > - Basically it would pass down the request and the >>> > > > authorization >>> > > > > > > >> > > information to the auditor implementation where >>> various >>> > kind >>> > > > of >>> > > > > > > >> reporting >>> > > > > > > >> > > can be done based on the request. >>> > > > > > > >> > > - A new config would be added called "auditor" which >>> is >>> > > > similar >>> > > > > to >>> > > > > > > the >>> > > > > > > >> > > "authorizer" config, but users can pass a list of >>> auditor >>> > > > class >>> > > > > > > names. >>> > > > > > > >> > > - The implementation is expected to be low latency >>> > similarly >>> > > > to >>> > > > > > the >>> > > > > > > >> > > Authorizer. >>> > > > > > > >> > > - A default implementation will be added that logs >>> into a >>> > > > file. >>> > > > > > > >> > > >>> > > > > > > >> > > I appreciate any feedback on this. >>> > > > > > > >> > > >>> > > > > > > >> > > Best, >>> > > > > > > >> > > Viktor >>> > > > > > > >> > > >>> > > > > > > >> >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> >>