Re: [DISCUSS] KIP-567: Kafka Cluster Audit (new discussion)

Viktor Somogyi-Vass Mon, 01 Mar 2021 08:33:59 -0800

Hey Everyone,

Tried out a new format to get some attention and also to make understanding
easier, so I recorded a 15 min long video about this KIP.
https://www.youtube.com/watch?v=uOJTyAEJmB8&feature=youtu.be


Sorry for the sound quality but recording a video isn't a thing for me and
also I look like someone who is being interrogated but again, I don't often
do this. :)

If there are no further objections I will start a voting in a few days.

Viktor

On Tue, Feb 2, 2021 at 1:54 PM Viktor Somogyi-Vass <viktorsomo...@gmail.com>
wrote:

> Hi all,
>
> I have updated the interfaces. I managed to shrink the required number of
> entities. Basically I store the event type with the event, therefore we can
> cover all topic related events (create, delete, change) with one event type.
>
> I think if on-one has objections then I'll start a vote soon.
>
> Viktor
>
> On Thu, Oct 29, 2020 at 5:15 PM Viktor Somogyi-Vass <
> viktorsomo...@gmail.com> wrote:
>
>> Hi Tom.
>>
>> Sorry for the delay.
>> Answering your points:
>>
>> > Why is it necessary to introduce this interface to produce the audit
>> trail
>> > when there is logging that can already record a lot of the same
>> > information, albeit in less structured form? If logging isn't adequate
>> it
>> > would be good to explain why not in the Motivation or Rejected
>> Alternatives
>> > section. It's worth pointing out that even the "less structured" part
>> would
>> > be helped by KIP-673, which proposes to change the RequestChannel's
>> logging
>> > to include a JSON representation of the request.
>>
>> We will need authorization details as would an auditor normally have them
>> but a request logger doesn't as you correctly pointed out later in your
>> reply. They would also appear at different lifecycle points I imagine, like
>> the request logger is probably when the request enters Kafka and the
>> auditor catches them before sending the response, so it can obtain all
>> information (authorization, execution).
>> Furthermore this auditing API would specifically target other JVM based
>> components that depend on Kafka (like Ranger or Atlas) and from both side's
>> perspective it's much better to expose Java level classes rather than a
>> lower level (JSON) implementation. If a Java level object is exposed then
>> we need to create them once during request processing which is fairly
>> low-fat since we're parsing the request most of the time anyways as opposed
>> to JSON which would need to be serialized first and then deserialized for
>> the consumer of the API.
>>
>> > I'm guessing what you gain from the proposed interface is the fact that
>> > it's called after the authorizer (perhaps after the request has been
>> > handled: I'm unclear about the purpose of AuditInfo.error), so you could
>> > generate a single record in the audit trail. That could still be
>> achieved
>> > using logging, either by correlating existing log messages or by
>> proposing
>> > some new logging just for this auditing purpose (perhaps with a logger
>> per
>> > API key so people could avoid the performance hit on the produce and
>> fetch
>> > paths if they weren't interested in auditing those things). Again, if
>> this
>> > doesn't work it would be great for the KIP to explain why.
>>
>> AuditInfo.error serves for capturing the possible errors that could
>> happen during the authorization and execution of the request. For instance
>> a partition creation request could be authorized and then rejected
>> with INVALID_TOPIC_EXCEPTION because the topic is queued for deletion. In
>> this case the AuditInfo.error would contain this API error thus emitting
>> information about the failure of the request. With normal auditing that
>> looks at only the authorization information we wouldn't detect it.
>> Regarding the produce and fetch performance: for these kinds of requests
>> I don't think we should enable parsing the batches themselves, just only
>> pass some meta information like which topics and partitions are affected.
>> These are parsed anyways for log reading and writing. Also similarly to the
>> authorizer we need to require implementations to run the auditing logic on
>> a different thread to minimize the performance impact.
>>
>> > To me there were parallels with previous discussions about broker-side
>> > interceptors (
>> > https://www.mail-archive.com/dev@kafka.apache.org/msg103310.html if
>> you've
>> > not seen it before), those seemed to founder on the unwillingness to
>> make
>> > the request internal classes into a supported API. You've tried to
>> address
>> > this by proposing a parallel set of classes implementing AuditEvent for
>> > exposing selective details about the request. It's not really clear that
>> > you really _need_ to access all that information about each request,
>> rather
>> > than simply recording it all, and it would also come with a significant
>> > implementation and maintenance cost. If it's simply about recording all
>> the
>> > information in the request, then it would likely be enough to pass a
>> > suitably formatted String rather than an AuditEvent, which basically
>> brings
>> > us back to point 1, but with some justification for not using logging.
>>
>> Thanks for this email thread, I haven't seen it but now I see it's a much
>> bigger tree that I'm chopping :). But the point is that everyone basically
>> faces a similar issue, that we need server side interceptors and
>> observables. Indeed auditing can be part of such an interceptor chain and
>> I've been thinking of it like this too sometimes but as it has been
>> correctly assessed in the thread "we're doing the one-offs". I also admit
>> that maintaining all the implementation of AuditEvent could be cumbersome
>> and maybe this isn't a way. However I think we should expose more
>> structured forms. If we maintain suitably formatted Strings and if the
>> protocol changes for some requests it could be much harder to trace the
>> needed changes back to these Strings.
>> One idea that I had while reading the email is we could generate these
>> classes similarly to the *Data classes (like CreateTopicsData). There would
>> be a flag called "useForAuditing=true" in the JSON definition of the
>> protocol that would cause the given field to be generated into a class that
>> would be the implementation of AuditEvent and would be a public API. It
>> would be instantiated when a request is deserialized. In my opinion it has
>> the advantage that it's tightly coupled with the protocol from the
>> maintenance point of view and still provides an efficient and structured
>> way of accessing certain information of the request.
>>
>> Best,
>> Viktor
>>
>> On Thu, Oct 1, 2020 at 4:16 PM Tom Bentley <tbent...@redhat.com> wrote:
>>
>>> Hi Viktor,
>>>
>>> Like Mickael, I can see that there's value in having an audit trail. For
>>> me
>>> the KIP raises a number of questions in its current form:
>>>
>>> Why is it necessary to introduce this interface to produce the audit
>>> trail
>>> when there is logging that can already record a lot of the same
>>> information, albeit in less structured form? If logging isn't adequate it
>>> would be good to explain why not in the Motivation or Rejected
>>> Alternatives
>>> section. It's worth pointing out that even the "less structured" part
>>> would
>>> be helped by KIP-673, which proposes to change the RequestChannel's
>>> logging
>>> to include a JSON representation of the request.
>>>
>>> I'm guessing what you gain from the proposed interface is the fact that
>>> it's called after the authorizer (perhaps after the request has been
>>> handled: I'm unclear about the purpose of AuditInfo.error), so you could
>>> generate a single record in the audit trail. That could still be achieved
>>> using logging, either by correlating existing log messages or by
>>> proposing
>>> some new logging just for this auditing purpose (perhaps with a logger
>>> per
>>> API key so people could avoid the performance hit on the produce and
>>> fetch
>>> paths if they weren't interested in auditing those things). Again, if
>>> this
>>> doesn't work it would be great for the KIP to explain why.
>>>
>>> To me there were parallels with previous discussions about broker-side
>>> interceptors (
>>> https://www.mail-archive.com/dev@kafka.apache.org/msg103310.html if
>>> you've
>>> not seen it before), those seemed to founder on the unwillingness to make
>>> the request internal classes into a supported API. You've tried to
>>> address
>>> this by proposing a parallel set of classes implementing AuditEvent for
>>> exposing selective details about the request. It's not really clear that
>>> you really _need_ to access all that information about each request,
>>> rather
>>> than simply recording it all, and it would also come with a significant
>>> implementation and maintenance cost. If it's simply about recording all
>>> the
>>> information in the request, then it would likely be enough to pass a
>>> suitably formatted String rather than an AuditEvent, which basically
>>> brings
>>> us back to point 1, but with some justification for not using logging.
>>>
>>> Kind regards,
>>>
>>> Tom
>>>
>>> On Thu, Oct 1, 2020 at 11:30 AM Dániel Urbán <urb.dani...@gmail.com>
>>> wrote:
>>>
>>> > Hi Viktor,
>>> >
>>> > I think the current state of the proposal is flexible enough to support
>>> > use-cases where the response data is of interest to the auditor.
>>> > This part ensures that: "... doing the auditing before sending the
>>> response
>>> > back ...". Additionally, event classes could be extended with
>>> additional
>>> > data if needed.
>>> >
>>> > Overall, the KIP looks good, thanks!
>>> >
>>> > Daniel
>>> >
>>> > Viktor Somogyi-Vass <viktorsomo...@gmail.com> ezt írta (időpont: 2020.
>>> > szept. 30., Sze, 17:24):
>>> >
>>> > > Hi Daniel,
>>> > >
>>> > > I think in this sense we can use the precedence set with the
>>> > > KAfkaAdminClient. It has *Result and *Options classes which in this
>>> > > interpretation are similar in versioning and usage as they transform
>>> and
>>> > > convey the responses of the protocol in a minimalistic API.
>>> > > I've modified the KIP a bit and created some examples for these event
>>> > > classes. For now as the implementation I think we can treat this
>>> > similarly
>>> > > to KIP-4 (AdminClient) which didn't push implementation for
>>> everything
>>> > but
>>> > > rather pushed implementing everything to subsequent KIPs as the
>>> > > requirements become important. In this first KIP we can create the
>>> more
>>> > > important ones (listed in the "Default Implementation") section if
>>> that
>>> > is
>>> > > fine.
>>> > >
>>> > > Regarding the response passing: to be honest I feel like that it's
>>> not
>>> > that
>>> > > strictly related to auditing but I think it's a good idea and could
>>> fit
>>> > > into this API. I think that we should design this current API with
>>> this
>>> > in
>>> > > mind. Did you have any specific ideas about the implementation?
>>> > >
>>> > > Viktor
>>> > >
>>> > > On Tue, Sep 22, 2020 at 9:05 AM Dániel Urbán <urb.dani...@gmail.com>
>>> > > wrote:
>>> > >
>>> > > > An example I had in mind was the ProduceResponse - the auditor
>>> might
>>> > need
>>> > > > access to the new end offset of the partitions.
>>> > > > The event-based approach sounds good - new events and fields can be
>>> > added
>>> > > > on-demand. Do we need the same versioning strategy we use with the
>>> > > > requests/responses?
>>> > > >
>>> > > > Daniel
>>> > > >
>>> > > > Viktor Somogyi-Vass <viktorsomo...@gmail.com> ezt írta (időpont:
>>> 2020.
>>> > > > szept. 21., H, 14:08):
>>> > > >
>>> > > > > Hi Daniel,
>>> > > > >
>>> > > > > > If the auditor needs access to the details of the action, one
>>> could
>>> > > > argue
>>> > > > > that even the response should be passed down to the auditor.
>>> > > > > At this point I don't think we need to include responses into the
>>> > > > interface
>>> > > > > but if you have a use-case we can consider doing that.
>>> > > > >
>>> > > > > > Is it feasible to convert the Java requests and responses to
>>> public
>>> > > > API?
>>> > > > > Well I think that in this case we would need to actually
>>> transform a
>>> > > lot
>>> > > > of
>>> > > > > classes and that might be a bit too invasive. Although since the
>>> > > protocol
>>> > > > > itself *is* a public API it might make sense to have some kind of
>>> > Java
>>> > > > > representation as a public API as well.
>>> > > > >
>>> > > > > > If not, do we have another option to access this info in the
>>> > auditor?
>>> > > > > I think one option would be to do what the original KIP-567 was
>>> > > > > implemented. Basically we could have an AuditEvent interface that
>>> > would
>>> > > > > contain request specific data. Its obvious drawback is that it
>>> has to
>>> > > be
>>> > > > > implemented for most of the 40 something protocols but on the
>>> upside
>>> > > > these
>>> > > > > classes shouldn't be complicated. I can try to do a PoC with
>>> this to
>>> > > see
>>> > > > > how it looks like and whether it solves the problem. To be
>>> honest I
>>> > > think
>>> > > > > it would be better than publishing the request classes as an API
>>> > > because
>>> > > > > here we're restricting access to only what is necessary.
>>> > > > >
>>> > > > > Thanks,
>>> > > > > Viktor
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > On Fri, Sep 18, 2020 at 8:37 AM Dániel Urbán <
>>> urb.dani...@gmail.com>
>>> > > > > wrote:
>>> > > > >
>>> > > > > > Hi,
>>> > > > > >
>>> > > > > > Thanks for the KIP.
>>> > > > > >
>>> > > > > > If the auditor needs access to the details of the action, one
>>> could
>>> > > > argue
>>> > > > > > that even the response should be passed down to the auditor.
>>> > > > > > Is it feasible to convert the Java requests and responses to
>>> public
>>> > > > API?
>>> > > > > > If not, do we have another option to access this info in the
>>> > auditor?
>>> > > > > > I know that the auditor could just send proper requests
>>> through the
>>> > > API
>>> > > > > to
>>> > > > > > the brokers, but that seems like an awful lot of overhead, and
>>> > could
>>> > > > > > introduce timing issues as well.
>>> > > > > >
>>> > > > > > Daniel
>>> > > > > >
>>> > > > > >
>>> > > > > > Viktor Somogyi-Vass <viktorsomo...@gmail.com> ezt írta
>>> (időpont:
>>> > > 2020.
>>> > > > > > szept. 16., Sze, 17:17):
>>> > > > > >
>>> > > > > > > One more after-thought on your second point
>>> (AbstractRequest):
>>> > the
>>> > > > > > reason I
>>> > > > > > > introduced it in the first place was that this way
>>> implementers
>>> > can
>>> > > > > > access
>>> > > > > > > request data. A use case can be if they want to audit a
>>> change in
>>> > > > > > > configuration or client quotas but not just acknowledge the
>>> fact
>>> > > that
>>> > > > > > such
>>> > > > > > > an event happened but also capture the change itself by
>>> peeking
>>> > > into
>>> > > > > the
>>> > > > > > > request. Sometimes it can be useful especially when people
>>> want
>>> > to
>>> > > > > trace
>>> > > > > > > back the order of events and what happened when and not just
>>> > > > > acknowledge
>>> > > > > > > that there was an event of a certain kind. I also recognize
>>> that
>>> > > this
>>> > > > > > might
>>> > > > > > > be a very loose interpretation of auditing as it's not
>>> strictly
>>> > > > related
>>> > > > > > to
>>> > > > > > > authorization but rather a way of tracing the admin actions
>>> > within
>>> > > > the
>>> > > > > > > cluster. It even could be a different API therefore but
>>> because
>>> > of
>>> > > > the
>>> > > > > > > variety of the Kafka APIs it's very hard to give a method
>>> that
>>> > fits
>>> > > > > all,
>>> > > > > > so
>>> > > > > > > it's easier to pass down the AbstractRequest and the
>>> > implementation
>>> > > > can
>>> > > > > > do
>>> > > > > > > the extraction of valuable info. So that's why I added this
>>> in
>>> > the
>>> > > > > first
>>> > > > > > > place and I'm interested in your thoughts.
>>> > > > > > >
>>> > > > > > > On Wed, Sep 16, 2020 at 4:41 PM Viktor Somogyi-Vass <
>>> > > > > > > viktorsomo...@gmail.com>
>>> > > > > > > wrote:
>>> > > > > > >
>>> > > > > > > > Hi Mickael,
>>> > > > > > > >
>>> > > > > > > > Thanks for reviewing the KIP.
>>> > > > > > > >
>>> > > > > > > > 1.) I just wanted to follow the conventions used with the
>>> > > > Authorizer
>>> > > > > as
>>> > > > > > > it
>>> > > > > > > > is built in a similar fashion, although it's true that in
>>> > > > KafkaServer
>>> > > > > > we
>>> > > > > > > > call the configure() method and the start() in the next
>>> line.
>>> > > This
>>> > > > > > would
>>> > > > > > > be
>>> > > > > > > > the same in Auditor and even simpler as there aren't any
>>> > > parameters
>>> > > > > to
>>> > > > > > > > start(), so I can remove it. If it turns out there is a
>>> need
>>> > for
>>> > > > it,
>>> > > > > we
>>> > > > > > > can
>>> > > > > > > > add it later.
>>> > > > > > > >
>>> > > > > > > > 2.) Yes, this is a very good point, I will remove it,
>>> however
>>> > in
>>> > > > this
>>> > > > > > > case
>>> > > > > > > > I don't think we need to add the ApiKey as it is already
>>> > > available
>>> > > > in
>>> > > > > > > > AuthorizableRequestContext.requestType(). One less
>>> parameter
>>> > :).
>>> > > > > > > >
>>> > > > > > > > 3.) I'll add it. It will simply log important changes in
>>> the
>>> > > > cluster
>>> > > > > > like
>>> > > > > > > > topic events (create, update, delete, partition or
>>> replication
>>> > > > factor
>>> > > > > > > > change), ACL events, config changes, reassignment,
>>> altering log
>>> > > > dirs,
>>> > > > > > > > offset delete, group delete with the authorization info
>>> like
>>> > who
>>> > > > > > > initiated
>>> > > > > > > > the call, was it authorized, were there any errors. Let me
>>> know
>>> > > if
>>> > > > > you
>>> > > > > > > > think there are other APIs I should include.
>>> > > > > > > >
>>> > > > > > > > 4.) The builder is there mostly for easier usability but
>>> > actually
>>> > > > > > > thinking
>>> > > > > > > > of it it doesn't help much so I removed it. The AuditInfo
>>> is
>>> > > also a
>>> > > > > > > helper
>>> > > > > > > > class so I don't see any value in transforming it into an
>>> > > interface
>>> > > > > but
>>> > > > > > > if
>>> > > > > > > > I simplify it (by removing the builder) it will be cleaner.
>>> > Would
>>> > > > > that
>>> > > > > > > work?
>>> > > > > > > >
>>> > > > > > > > I'll update the KIP to reflect my answers.
>>> > > > > > > >
>>> > > > > > > > Viktor
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > On Mon, Sep 14, 2020 at 6:02 PM Mickael Maison <
>>> > > > > > mickael.mai...@gmail.com
>>> > > > > > > >
>>> > > > > > > > wrote:
>>> > > > > > > >
>>> > > > > > > >> Hi Viktor,
>>> > > > > > > >>
>>> > > > > > > >> Thanks for restarting the discussion on this KIP. Being
>>> able
>>> > to
>>> > > > > easily
>>> > > > > > > >> audit usage of a Kafka cluster is a very valuable feature.
>>> > > > > > > >>
>>> > > > > > > >> Regarding the API, I have a few of questions:
>>> > > > > > > >> 1) You introduced a start() method. I don't think any
>>> other
>>> > > > > interfaces
>>> > > > > > > >> have such a method. Users can do any setup they want in
>>> > > > configure()
>>> > > > > > > >>
>>> > > > > > > >> 2) The first argument of audit is an AbstractRequest.
>>> > > > Unfortunately
>>> > > > > > > >> this type is not part of the public API. But actually I'm
>>> not
>>> > > sure
>>> > > > > > > >> having the full request is really needed here. Maybe just
>>> > > passing
>>> > > > > the
>>> > > > > > > >> Apikey would be enough as we already have all the
>>> resources
>>> > from
>>> > > > the
>>> > > > > > > >> auditInfos field.
>>> > > > > > > >>
>>> > > > > > > >> 3) The KIP mentions a "LoggingAuditor" default
>>> implementation.
>>> > > > What
>>> > > > > is
>>> > > > > > > >> it doing? Can you add more details about it?
>>> > > > > > > >>
>>> > > > > > > >> 4) Can fields of AuditInfo be null? I can see there's a
>>> > > > constructor
>>> > > > > > > >> without an Errors and that sets the error field to None.
>>> > > However,
>>> > > > > with
>>> > > > > > > >> the builder pattern, if error is not set it's null.
>>> > > > > > > >>
>>> > > > > > > >> 5) Should AuditInfo be an interface?
>>> > > > > > > >>
>>> > > > > > > >> On Mon, Sep 14, 2020 at 3:26 PM Viktor Somogyi-Vass
>>> > > > > > > >> <viktorsomo...@gmail.com> wrote:
>>> > > > > > > >> >
>>> > > > > > > >> > Hi everyone,
>>> > > > > > > >> >
>>> > > > > > > >> > Changed the interface a little bit to accommodate
>>> methods
>>> > > better
>>> > > > > > where
>>> > > > > > > >> > authorization happens for multiple operations so the
>>> > > implementer
>>> > > > > of
>>> > > > > > > the
>>> > > > > > > >> > audit interface will receive all authorizations
>>> together.
>>> > > > > > > >> > I'll wait a few more days to allow people to react or
>>> give
>>> > > > > feedback
>>> > > > > > > but
>>> > > > > > > >> if
>>> > > > > > > >> > there are no objections until then, I'll start a vote.
>>> > > > > > > >> >
>>> > > > > > > >> > Viktor
>>> > > > > > > >> >
>>> > > > > > > >> > On Tue, Sep 8, 2020 at 9:49 AM Viktor Somogyi-Vass <
>>> > > > > > > >> viktorsomo...@gmail.com>
>>> > > > > > > >> > wrote:
>>> > > > > > > >> >
>>> > > > > > > >> > > Hi Everyone,
>>> > > > > > > >> > >
>>> > > > > > > >> > > I'd like to restart the discussion on this. Since the
>>> KIP
>>> > > has
>>> > > > > been
>>> > > > > > > >> > > revamped I thought I'd start a new discussion thread.
>>> > > > > > > >> > >
>>> > > > > > > >> > > Link:
>>> > > > > > > >> > >
>>> > > > > > > >>
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-567%3A+Kafka+Cluster+Audit
>>> > > > > > > >> > >
>>> > > > > > > >> > > Short summary:
>>> > > > > > > >> > > - Would like to introduce a new interface similar to
>>> the
>>> > > > > > Authorizer
>>> > > > > > > >> called
>>> > > > > > > >> > > Auditor as follows:
>>> > > > > > > >> > >     public interface Auditor {
>>> > > > > > > >> > >         audit(Request r, AuthorizableRequestContext c,
>>> > > > > > AclOperation
>>> > > > > > > >> > > o, Map<ResourcePattern, Boolean> isAllowed,
>>> > > > Map<ResourcePattern,
>>> > > > > > > >> Errors>
>>> > > > > > > >> > > errors);
>>> > > > > > > >> > >     }
>>> > > > > > > >> > > - Basically it would pass down the request and the
>>> > > > authorization
>>> > > > > > > >> > > information to the auditor implementation where
>>> various
>>> > kind
>>> > > > of
>>> > > > > > > >> reporting
>>> > > > > > > >> > > can be done based on the request.
>>> > > > > > > >> > > - A new config would be added called "auditor" which
>>> is
>>> > > > similar
>>> > > > > to
>>> > > > > > > the
>>> > > > > > > >> > > "authorizer" config, but users can pass a list of
>>> auditor
>>> > > > class
>>> > > > > > > names.
>>> > > > > > > >> > > - The implementation is expected to be low latency
>>> > similarly
>>> > > > to
>>> > > > > > the
>>> > > > > > > >> > > Authorizer.
>>> > > > > > > >> > > - A default implementation will be added that logs
>>> into a
>>> > > > file.
>>> > > > > > > >> > >
>>> > > > > > > >> > > I appreciate any feedback on this.
>>> > > > > > > >> > >
>>> > > > > > > >> > > Best,
>>> > > > > > > >> > > Viktor
>>> > > > > > > >> > >
>>> > > > > > > >>
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>

Re: [DISCUSS] KIP-567: Kafka Cluster Audit (new discussion)

Reply via email to