Re: [DISCUSS] KIP-567: Kafka Cluster Audit (new discussion)

Viktor Somogyi-Vass Thu, 29 Oct 2020 09:15:49 -0700

Hi Tom.

Sorry for the delay.
Answering your points:


> Why is it necessary to introduce this interface to produce the audit trail
> when there is logging that can already record a lot of the same
> information, albeit in less structured form? If logging isn't adequate it
> would be good to explain why not in the Motivation or Rejected
Alternatives
> section. It's worth pointing out that even the "less structured" part
would
> be helped by KIP-673, which proposes to change the RequestChannel's
logging
> to include a JSON representation of the request.

We will need authorization details as would an auditor normally have them
but a request logger doesn't as you correctly pointed out later in your
reply. They would also appear at different lifecycle points I imagine, like
the request logger is probably when the request enters Kafka and the
auditor catches them before sending the response, so it can obtain all
information (authorization, execution).
Furthermore this auditing API would specifically target other JVM based
components that depend on Kafka (like Ranger or Atlas) and from both side's
perspective it's much better to expose Java level classes rather than a
lower level (JSON) implementation. If a Java level object is exposed then
we need to create them once during request processing which is fairly
low-fat since we're parsing the request most of the time anyways as opposed
to JSON which would need to be serialized first and then deserialized for
the consumer of the API.

> I'm guessing what you gain from the proposed interface is the fact that
> it's called after the authorizer (perhaps after the request has been
> handled: I'm unclear about the purpose of AuditInfo.error), so you could
> generate a single record in the audit trail. That could still be achieved
> using logging, either by correlating existing log messages or by proposing
> some new logging just for this auditing purpose (perhaps with a logger per
> API key so people could avoid the performance hit on the produce and fetch
> paths if they weren't interested in auditing those things). Again, if this
> doesn't work it would be great for the KIP to explain why.

AuditInfo.error serves for capturing the possible errors that could happen
during the authorization and execution of the request. For instance a
partition creation request could be authorized and then rejected
with INVALID_TOPIC_EXCEPTION because the topic is queued for deletion. In
this case the AuditInfo.error would contain this API error thus emitting
information about the failure of the request. With normal auditing that
looks at only the authorization information we wouldn't detect it.
Regarding the produce and fetch performance: for these kinds of requests I
don't think we should enable parsing the batches themselves, just only pass
some meta information like which topics and partitions are affected. These
are parsed anyways for log reading and writing. Also similarly to the
authorizer we need to require implementations to run the auditing logic on
a different thread to minimize the performance impact.

> To me there were parallels with previous discussions about broker-side
> interceptors (
> https://www.mail-archive.com/dev@kafka.apache.org/msg103310.html if you've
> not seen it before), those seemed to founder on the unwillingness to make
> the request internal classes into a supported API. You've tried to address
> this by proposing a parallel set of classes implementing AuditEvent for
> exposing selective details about the request. It's not really clear that
> you really _need_ to access all that information about each request,
rather
> than simply recording it all, and it would also come with a significant
> implementation and maintenance cost. If it's simply about recording all
the
> information in the request, then it would likely be enough to pass a
> suitably formatted String rather than an AuditEvent, which basically
brings
> us back to point 1, but with some justification for not using logging.

Thanks for this email thread, I haven't seen it but now I see it's a much
bigger tree that I'm chopping :). But the point is that everyone basically
faces a similar issue, that we need server side interceptors and
observables. Indeed auditing can be part of such an interceptor chain and
I've been thinking of it like this too sometimes but as it has been
correctly assessed in the thread "we're doing the one-offs". I also admit
that maintaining all the implementation of AuditEvent could be cumbersome
and maybe this isn't a way. However I think we should expose more
structured forms. If we maintain suitably formatted Strings and if the
protocol changes for some requests it could be much harder to trace the
needed changes back to these Strings.
One idea that I had while reading the email is we could generate these
classes similarly to the *Data classes (like CreateTopicsData). There would
be a flag called "useForAuditing=true" in the JSON definition of the
protocol that would cause the given field to be generated into a class that
would be the implementation of AuditEvent and would be a public API. It
would be instantiated when a request is deserialized. In my opinion it has
the advantage that it's tightly coupled with the protocol from the
maintenance point of view and still provides an efficient and structured
way of accessing certain information of the request.

Best,
Viktor

On Thu, Oct 1, 2020 at 4:16 PM Tom Bentley <tbent...@redhat.com> wrote:

> Hi Viktor,
>
> Like Mickael, I can see that there's value in having an audit trail. For me
> the KIP raises a number of questions in its current form:
>
> Why is it necessary to introduce this interface to produce the audit trail
> when there is logging that can already record a lot of the same
> information, albeit in less structured form? If logging isn't adequate it
> would be good to explain why not in the Motivation or Rejected Alternatives
> section. It's worth pointing out that even the "less structured" part would
> be helped by KIP-673, which proposes to change the RequestChannel's logging
> to include a JSON representation of the request.
>
> I'm guessing what you gain from the proposed interface is the fact that
> it's called after the authorizer (perhaps after the request has been
> handled: I'm unclear about the purpose of AuditInfo.error), so you could
> generate a single record in the audit trail. That could still be achieved
> using logging, either by correlating existing log messages or by proposing
> some new logging just for this auditing purpose (perhaps with a logger per
> API key so people could avoid the performance hit on the produce and fetch
> paths if they weren't interested in auditing those things). Again, if this
> doesn't work it would be great for the KIP to explain why.
>
> To me there were parallels with previous discussions about broker-side
> interceptors (
> https://www.mail-archive.com/dev@kafka.apache.org/msg103310.html if you've
> not seen it before), those seemed to founder on the unwillingness to make
> the request internal classes into a supported API. You've tried to address
> this by proposing a parallel set of classes implementing AuditEvent for
> exposing selective details about the request. It's not really clear that
> you really _need_ to access all that information about each request, rather
> than simply recording it all, and it would also come with a significant
> implementation and maintenance cost. If it's simply about recording all the
> information in the request, then it would likely be enough to pass a
> suitably formatted String rather than an AuditEvent, which basically brings
> us back to point 1, but with some justification for not using logging.
>
> Kind regards,
>
> Tom
>
> On Thu, Oct 1, 2020 at 11:30 AM Dániel Urbán <urb.dani...@gmail.com>
> wrote:
>
> > Hi Viktor,
> >
> > I think the current state of the proposal is flexible enough to support
> > use-cases where the response data is of interest to the auditor.
> > This part ensures that: "... doing the auditing before sending the
> response
> > back ...". Additionally, event classes could be extended with additional
> > data if needed.
> >
> > Overall, the KIP looks good, thanks!
> >
> > Daniel
> >
> > Viktor Somogyi-Vass <viktorsomo...@gmail.com> ezt írta (időpont: 2020.
> > szept. 30., Sze, 17:24):
> >
> > > Hi Daniel,
> > >
> > > I think in this sense we can use the precedence set with the
> > > KAfkaAdminClient. It has *Result and *Options classes which in this
> > > interpretation are similar in versioning and usage as they transform
> and
> > > convey the responses of the protocol in a minimalistic API.
> > > I've modified the KIP a bit and created some examples for these event
> > > classes. For now as the implementation I think we can treat this
> > similarly
> > > to KIP-4 (AdminClient) which didn't push implementation for everything
> > but
> > > rather pushed implementing everything to subsequent KIPs as the
> > > requirements become important. In this first KIP we can create the more
> > > important ones (listed in the "Default Implementation") section if that
> > is
> > > fine.
> > >
> > > Regarding the response passing: to be honest I feel like that it's not
> > that
> > > strictly related to auditing but I think it's a good idea and could fit
> > > into this API. I think that we should design this current API with this
> > in
> > > mind. Did you have any specific ideas about the implementation?
> > >
> > > Viktor
> > >
> > > On Tue, Sep 22, 2020 at 9:05 AM Dániel Urbán <urb.dani...@gmail.com>
> > > wrote:
> > >
> > > > An example I had in mind was the ProduceResponse - the auditor might
> > need
> > > > access to the new end offset of the partitions.
> > > > The event-based approach sounds good - new events and fields can be
> > added
> > > > on-demand. Do we need the same versioning strategy we use with the
> > > > requests/responses?
> > > >
> > > > Daniel
> > > >
> > > > Viktor Somogyi-Vass <viktorsomo...@gmail.com> ezt írta (időpont:
> 2020.
> > > > szept. 21., H, 14:08):
> > > >
> > > > > Hi Daniel,
> > > > >
> > > > > > If the auditor needs access to the details of the action, one
> could
> > > > argue
> > > > > that even the response should be passed down to the auditor.
> > > > > At this point I don't think we need to include responses into the
> > > > interface
> > > > > but if you have a use-case we can consider doing that.
> > > > >
> > > > > > Is it feasible to convert the Java requests and responses to
> public
> > > > API?
> > > > > Well I think that in this case we would need to actually transform
> a
> > > lot
> > > > of
> > > > > classes and that might be a bit too invasive. Although since the
> > > protocol
> > > > > itself *is* a public API it might make sense to have some kind of
> > Java
> > > > > representation as a public API as well.
> > > > >
> > > > > > If not, do we have another option to access this info in the
> > auditor?
> > > > > I think one option would be to do what the original KIP-567 was
> > > > > implemented. Basically we could have an AuditEvent interface that
> > would
> > > > > contain request specific data. Its obvious drawback is that it has
> to
> > > be
> > > > > implemented for most of the 40 something protocols but on the
> upside
> > > > these
> > > > > classes shouldn't be complicated. I can try to do a PoC with this
> to
> > > see
> > > > > how it looks like and whether it solves the problem. To be honest I
> > > think
> > > > > it would be better than publishing the request classes as an API
> > > because
> > > > > here we're restricting access to only what is necessary.
> > > > >
> > > > > Thanks,
> > > > > Viktor
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Sep 18, 2020 at 8:37 AM Dániel Urbán <
> urb.dani...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Thanks for the KIP.
> > > > > >
> > > > > > If the auditor needs access to the details of the action, one
> could
> > > > argue
> > > > > > that even the response should be passed down to the auditor.
> > > > > > Is it feasible to convert the Java requests and responses to
> public
> > > > API?
> > > > > > If not, do we have another option to access this info in the
> > auditor?
> > > > > > I know that the auditor could just send proper requests through
> the
> > > API
> > > > > to
> > > > > > the brokers, but that seems like an awful lot of overhead, and
> > could
> > > > > > introduce timing issues as well.
> > > > > >
> > > > > > Daniel
> > > > > >
> > > > > >
> > > > > > Viktor Somogyi-Vass <viktorsomo...@gmail.com> ezt írta (időpont:
> > > 2020.
> > > > > > szept. 16., Sze, 17:17):
> > > > > >
> > > > > > > One more after-thought on your second point (AbstractRequest):
> > the
> > > > > > reason I
> > > > > > > introduced it in the first place was that this way implementers
> > can
> > > > > > access
> > > > > > > request data. A use case can be if they want to audit a change
> in
> > > > > > > configuration or client quotas but not just acknowledge the
> fact
> > > that
> > > > > > such
> > > > > > > an event happened but also capture the change itself by peeking
> > > into
> > > > > the
> > > > > > > request. Sometimes it can be useful especially when people want
> > to
> > > > > trace
> > > > > > > back the order of events and what happened when and not just
> > > > > acknowledge
> > > > > > > that there was an event of a certain kind. I also recognize
> that
> > > this
> > > > > > might
> > > > > > > be a very loose interpretation of auditing as it's not strictly
> > > > related
> > > > > > to
> > > > > > > authorization but rather a way of tracing the admin actions
> > within
> > > > the
> > > > > > > cluster. It even could be a different API therefore but because
> > of
> > > > the
> > > > > > > variety of the Kafka APIs it's very hard to give a method that
> > fits
> > > > > all,
> > > > > > so
> > > > > > > it's easier to pass down the AbstractRequest and the
> > implementation
> > > > can
> > > > > > do
> > > > > > > the extraction of valuable info. So that's why I added this in
> > the
> > > > > first
> > > > > > > place and I'm interested in your thoughts.
> > > > > > >
> > > > > > > On Wed, Sep 16, 2020 at 4:41 PM Viktor Somogyi-Vass <
> > > > > > > viktorsomo...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Mickael,
> > > > > > > >
> > > > > > > > Thanks for reviewing the KIP.
> > > > > > > >
> > > > > > > > 1.) I just wanted to follow the conventions used with the
> > > > Authorizer
> > > > > as
> > > > > > > it
> > > > > > > > is built in a similar fashion, although it's true that in
> > > > KafkaServer
> > > > > > we
> > > > > > > > call the configure() method and the start() in the next line.
> > > This
> > > > > > would
> > > > > > > be
> > > > > > > > the same in Auditor and even simpler as there aren't any
> > > parameters
> > > > > to
> > > > > > > > start(), so I can remove it. If it turns out there is a need
> > for
> > > > it,
> > > > > we
> > > > > > > can
> > > > > > > > add it later.
> > > > > > > >
> > > > > > > > 2.) Yes, this is a very good point, I will remove it, however
> > in
> > > > this
> > > > > > > case
> > > > > > > > I don't think we need to add the ApiKey as it is already
> > > available
> > > > in
> > > > > > > > AuthorizableRequestContext.requestType(). One less parameter
> > :).
> > > > > > > >
> > > > > > > > 3.) I'll add it. It will simply log important changes in the
> > > > cluster
> > > > > > like
> > > > > > > > topic events (create, update, delete, partition or
> replication
> > > > factor
> > > > > > > > change), ACL events, config changes, reassignment, altering
> log
> > > > dirs,
> > > > > > > > offset delete, group delete with the authorization info like
> > who
> > > > > > > initiated
> > > > > > > > the call, was it authorized, were there any errors. Let me
> know
> > > if
> > > > > you
> > > > > > > > think there are other APIs I should include.
> > > > > > > >
> > > > > > > > 4.) The builder is there mostly for easier usability but
> > actually
> > > > > > > thinking
> > > > > > > > of it it doesn't help much so I removed it. The AuditInfo is
> > > also a
> > > > > > > helper
> > > > > > > > class so I don't see any value in transforming it into an
> > > interface
> > > > > but
> > > > > > > if
> > > > > > > > I simplify it (by removing the builder) it will be cleaner.
> > Would
> > > > > that
> > > > > > > work?
> > > > > > > >
> > > > > > > > I'll update the KIP to reflect my answers.
> > > > > > > >
> > > > > > > > Viktor
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Sep 14, 2020 at 6:02 PM Mickael Maison <
> > > > > > mickael.mai...@gmail.com
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Hi Viktor,
> > > > > > > >>
> > > > > > > >> Thanks for restarting the discussion on this KIP. Being able
> > to
> > > > > easily
> > > > > > > >> audit usage of a Kafka cluster is a very valuable feature.
> > > > > > > >>
> > > > > > > >> Regarding the API, I have a few of questions:
> > > > > > > >> 1) You introduced a start() method. I don't think any other
> > > > > interfaces
> > > > > > > >> have such a method. Users can do any setup they want in
> > > > configure()
> > > > > > > >>
> > > > > > > >> 2) The first argument of audit is an AbstractRequest.
> > > > Unfortunately
> > > > > > > >> this type is not part of the public API. But actually I'm
> not
> > > sure
> > > > > > > >> having the full request is really needed here. Maybe just
> > > passing
> > > > > the
> > > > > > > >> Apikey would be enough as we already have all the resources
> > from
> > > > the
> > > > > > > >> auditInfos field.
> > > > > > > >>
> > > > > > > >> 3) The KIP mentions a "LoggingAuditor" default
> implementation.
> > > > What
> > > > > is
> > > > > > > >> it doing? Can you add more details about it?
> > > > > > > >>
> > > > > > > >> 4) Can fields of AuditInfo be null? I can see there's a
> > > > constructor
> > > > > > > >> without an Errors and that sets the error field to None.
> > > However,
> > > > > with
> > > > > > > >> the builder pattern, if error is not set it's null.
> > > > > > > >>
> > > > > > > >> 5) Should AuditInfo be an interface?
> > > > > > > >>
> > > > > > > >> On Mon, Sep 14, 2020 at 3:26 PM Viktor Somogyi-Vass
> > > > > > > >> <viktorsomo...@gmail.com> wrote:
> > > > > > > >> >
> > > > > > > >> > Hi everyone,
> > > > > > > >> >
> > > > > > > >> > Changed the interface a little bit to accommodate methods
> > > better
> > > > > > where
> > > > > > > >> > authorization happens for multiple operations so the
> > > implementer
> > > > > of
> > > > > > > the
> > > > > > > >> > audit interface will receive all authorizations together.
> > > > > > > >> > I'll wait a few more days to allow people to react or give
> > > > > feedback
> > > > > > > but
> > > > > > > >> if
> > > > > > > >> > there are no objections until then, I'll start a vote.
> > > > > > > >> >
> > > > > > > >> > Viktor
> > > > > > > >> >
> > > > > > > >> > On Tue, Sep 8, 2020 at 9:49 AM Viktor Somogyi-Vass <
> > > > > > > >> viktorsomo...@gmail.com>
> > > > > > > >> > wrote:
> > > > > > > >> >
> > > > > > > >> > > Hi Everyone,
> > > > > > > >> > >
> > > > > > > >> > > I'd like to restart the discussion on this. Since the
> KIP
> > > has
> > > > > been
> > > > > > > >> > > revamped I thought I'd start a new discussion thread.
> > > > > > > >> > >
> > > > > > > >> > > Link:
> > > > > > > >> > >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-567%3A+Kafka+Cluster+Audit
> > > > > > > >> > >
> > > > > > > >> > > Short summary:
> > > > > > > >> > > - Would like to introduce a new interface similar to the
> > > > > > Authorizer
> > > > > > > >> called
> > > > > > > >> > > Auditor as follows:
> > > > > > > >> > >     public interface Auditor {
> > > > > > > >> > >         audit(Request r, AuthorizableRequestContext c,
> > > > > > AclOperation
> > > > > > > >> > > o, Map<ResourcePattern, Boolean> isAllowed,
> > > > Map<ResourcePattern,
> > > > > > > >> Errors>
> > > > > > > >> > > errors);
> > > > > > > >> > >     }
> > > > > > > >> > > - Basically it would pass down the request and the
> > > > authorization
> > > > > > > >> > > information to the auditor implementation where various
> > kind
> > > > of
> > > > > > > >> reporting
> > > > > > > >> > > can be done based on the request.
> > > > > > > >> > > - A new config would be added called "auditor" which is
> > > > similar
> > > > > to
> > > > > > > the
> > > > > > > >> > > "authorizer" config, but users can pass a list of
> auditor
> > > > class
> > > > > > > names.
> > > > > > > >> > > - The implementation is expected to be low latency
> > similarly
> > > > to
> > > > > > the
> > > > > > > >> > > Authorizer.
> > > > > > > >> > > - A default implementation will be added that logs into
> a
> > > > file.
> > > > > > > >> > >
> > > > > > > >> > > I appreciate any feedback on this.
> > > > > > > >> > >
> > > > > > > >> > > Best,
> > > > > > > >> > > Viktor
> > > > > > > >> > >
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-567: Kafka Cluster Audit (new discussion)

Reply via email to