Re: [DISCUSS] KIP-940: Broker extension point for validating record contents at produce time

Jorge Esteban Quilcate Otoya Fri, 30 Jun 2023 07:48:39 -0700

Thank you both for the replies! A couple more comments:

On Tue, 27 Jun 2023 at 14:57, Edoardo Comar <edoardli...@gmail.com> wrote:


> Hi Jorge
> thanks for the feedback. Comments inline below
>
> > 1. Similar to Kirk's first point, I'm also concerned on how would the
> > plugin developers / operators be able to apply multiple policies and how
> > configurations would be passed to each policy.
>
> We’ve only attempted to tackle the “one plugin per broker” model with
> this KIP, as that’s the use-case we most clearly understand. Although,
> as noted in the rejected alternatives section, it would be possible to
> use a facade-like pattern to delegate from one plugin implementation
> to others. The reason we’ve avoided tackling multiple plugins is that
> it introduces further complexity (which takes precedence? Is
> configuration needed to say which plugin applies to which topic?
> Etc.), and we are concerned that without a clear use-case we might
> make decisions we later come to regret. Hopefully by offering minimal
> configuration options, we don’t hinder a future “support multiple
> record validation policies” KIP.
>

Got it. Thanks!


>
> > Some approaches from other plugins we can get some inspiration from:
> >
> > - AlterConfig, CreateTopic policies are designed to be 1 policy
> > implementing the different APIs. Up to the plugin developer to pull
> > policies together and configure it on the broker side. I guess for Record
> > Validation this may be cumbersome considering some Schema Registry
> > providers may want to offer implementations for their own backend.
> >
> > - Connect Transforms: here there's a named set of plugins to apply per
> > connector, and each transform has its own configuration defined by
> prefix.
> > Personally, I'd consider this one an interesting approach if we decide to
> > allow multiple record validations to be configured.
> >
> > - Tiered Storage (probably Connectors as well) have class-loader aware
> > implementations with class path specific to the plugin. Not sure if this
> is
> > something to discuss on the KIP or later on the PR, but we could mention
> > something on how this plugin would deal with dependency conflicts (e.g.
> > different jackson version between broker, plugin(s)).
>
>
> Thanks for highlighting all of these places where we can draw
> inspiration. We’ve updated the KIP with an additional classloader
> property to match the tiered storage implementation. It seems likely
> that record validation policy implementations will live in the
> codebase of their corresponding schema registry (as is the case,
> today, for the client serdes used to integrate with a schema registry)
> - so it makes sense to insulate their implementation from specific
> .jar versions that may (or may not) be present in a particular version
> of the broker.
>
> > Also, by potentially supporting multiple plugins for record validation,
> it
> > would be important to consider if it's an all or nothing relation, or
> > posible to choose _some_ policies apply per topic.
> > I see there's some preference for setting the validation policy name on
> the
> > topic, though this could be cumbersome to operate: topic creation users
> may
> > not be aware of the record validation (similar to CreateTopic/AlterConfig
> > policies) and would impose additional coordination.
> > Maybe a flag to whether apply policies or not would be a better approach?
>
> Could you elaborate more on your comments about “maybe a flag to
> whether to apply policies or not would be a better approach?”. We
> thought that setting the ‘record.validation.policy’ property on a
> topic to a value supported by the plugin was such a flag - but it
> sounds like you might have a different approach in mind?
>
>
The current proposal is to have ‘record.validation.policy’ per topic
(default null). A flag would be something like
‘record.validation.policy.enable’ (default=false) may be simpler to
configure from the user perspective.

Also, at the moment, is a bit unclear to me what value the topic config
‘record.validation.policy’ should contain: is the policy class name? How is
the policy expected to use the name received?


> > 2. Have you consider adding the record metadata to the API? It may be
> > useful for logging purposes (e.g. if record validation fails, to log
> > topic-partition), or some policies are interested on record metadata
> (e.g.
> > compression, timestamp type, etc.)
>
> The topic/partition is available to the plugin via the TopicMetadata
> interace. Additional record properties could be added to the
> ‘RecordProxy’ interface, however the topic of how rich to make the
> interface was a sticking point for KIP-729. The intent behind the
> ‘TopicMetadata’ and ‘RecordProxy’ classes is that they can be extended
> in the future without breaking existing plugin implementations - so
> we’re not precluding further properties from being added if we’ve been
> too austere.
>

I see, agree.


>
> > 3. A minor comment for consistency regarding the APIs. As far as I have
> > seen, we tend to use the name of the object returned directly instead of
> > getters notation, see `AlterConfigPolicy.RecordMetadata` [1]. We may
> rename
> > some of the proposed APIs accordingly:
> >
> > - `RecordProxy#headers()|key()|value()`
> > - `TopicMetadata#topicPartition()`
>
> Good point, and thanks for taking the time to find those examples.
> We’ve fixed the KIP to be consistent with this use of returned object
> name, and dropped the use of getter notation.
>
> > 4. For the `RecordIntrospectionHints`, I'm struggling to see how this may
> > be used by the policy developers. Would you mind adding some examples on
> > how the policy in general may be used?
> > Specifically, `long needKeyBytes|needKeyValue` are difficult to interpret
> > to me.
> > nit: maybe replace `need*` with `requiresAccess*` or simply `access*`, or
> > similar.
>
> The “hints” are intended to allow the broker to optimise what parts
> and how much of a record is deserialized before calling into the
> plugin. For example, if I wanted to write a plugin that validated that
> all records continaed a valid schema ID, then it might only be
> necessary to deserialize certain parts of the record - say the
> headers, to perform this validation. With hindsight “need” isn’t a
> great verb for this, so we’ve updated the KIP with “access” instead -
> as per your suggestion.
>

Thanks! I think adding a simple example of a Policy implementation and how
plugin developer may use this hints (and metadata as well) may bring some
clarity to the proposal.


>
> HTH, regards
> Edo & Adrian
>

Re: [DISCUSS] KIP-940: Broker extension point for validating record contents at produce time

Reply via email to