Re: [DISCUSS] KIP-940: Broker extension point for validating record contents at produce time

Edoardo Comar Tue, 27 Jun 2023 04:58:20 -0700

Hi Jorge
thanks for the feedback. Comments inline below

> 1. Similar to Kirk's first point, I'm also concerned on how would the
> plugin developers / operators be able to apply multiple policies and how
> configurations would be passed to each policy.


We’ve only attempted to tackle the “one plugin per broker” model with
this KIP, as that’s the use-case we most clearly understand. Although,
as noted in the rejected alternatives section, it would be possible to
use a facade-like pattern to delegate from one plugin implementation
to others. The reason we’ve avoided tackling multiple plugins is that
it introduces further complexity (which takes precedence? Is
configuration needed to say which plugin applies to which topic?
Etc.), and we are concerned that without a clear use-case we might
make decisions we later come to regret. Hopefully by offering minimal
configuration options, we don’t hinder a future “support multiple
record validation policies” KIP.

> Some approaches from other plugins we can get some inspiration from:
>
> - AlterConfig, CreateTopic policies are designed to be 1 policy
> implementing the different APIs. Up to the plugin developer to pull
> policies together and configure it on the broker side. I guess for Record
> Validation this may be cumbersome considering some Schema Registry
> providers may want to offer implementations for their own backend.
>
> - Connect Transforms: here there's a named set of plugins to apply per
> connector, and each transform has its own configuration defined by prefix.
> Personally, I'd consider this one an interesting approach if we decide to
> allow multiple record validations to be configured.
>
> - Tiered Storage (probably Connectors as well) have class-loader aware
> implementations with class path specific to the plugin. Not sure if this is
> something to discuss on the KIP or later on the PR, but we could mention
> something on how this plugin would deal with dependency conflicts (e.g.
> different jackson version between broker, plugin(s)).


Thanks for highlighting all of these places where we can draw
inspiration. We’ve updated the KIP with an additional classloader
property to match the tiered storage implementation. It seems likely
that record validation policy implementations will live in the
codebase of their corresponding schema registry (as is the case,
today, for the client serdes used to integrate with a schema registry)
- so it makes sense to insulate their implementation from specific
.jar versions that may (or may not) be present in a particular version
of the broker.

> Also, by potentially supporting multiple plugins for record validation, it
> would be important to consider if it's an all or nothing relation, or
> posible to choose _some_ policies apply per topic.
> I see there's some preference for setting the validation policy name on the
> topic, though this could be cumbersome to operate: topic creation users may
> not be aware of the record validation (similar to CreateTopic/AlterConfig
> policies) and would impose additional coordination.
> Maybe a flag to whether apply policies or not would be a better approach?

Could you elaborate more on your comments about “maybe a flag to
whether to apply policies or not would be a better approach?”. We
thought that setting the ‘record.validation.policy’ property on a
topic to a value supported by the plugin was such a flag - but it
sounds like you might have a different approach in mind?

> 2. Have you consider adding the record metadata to the API? It may be
> useful for logging purposes (e.g. if record validation fails, to log
> topic-partition), or some policies are interested on record metadata (e.g.
> compression, timestamp type, etc.)

The topic/partition is available to the plugin via the TopicMetadata
interace. Additional record properties could be added to the
‘RecordProxy’ interface, however the topic of how rich to make the
interface was a sticking point for KIP-729. The intent behind the
‘TopicMetadata’ and ‘RecordProxy’ classes is that they can be extended
in the future without breaking existing plugin implementations - so
we’re not precluding further properties from being added if we’ve been
too austere.

> 3. A minor comment for consistency regarding the APIs. As far as I have
> seen, we tend to use the name of the object returned directly instead of
> getters notation, see `AlterConfigPolicy.RecordMetadata` [1]. We may rename
> some of the proposed APIs accordingly:
>
> - `RecordProxy#headers()|key()|value()`
> - `TopicMetadata#topicPartition()`

Good point, and thanks for taking the time to find those examples.
We’ve fixed the KIP to be consistent with this use of returned object
name, and dropped the use of getter notation.

> 4. For the `RecordIntrospectionHints`, I'm struggling to see how this may
> be used by the policy developers. Would you mind adding some examples on
> how the policy in general may be used?
> Specifically, `long needKeyBytes|needKeyValue` are difficult to interpret
> to me.
> nit: maybe replace `need*` with `requiresAccess*` or simply `access*`, or
> similar.

The “hints” are intended to allow the broker to optimise what parts
and how much of a record is deserialized before calling into the
plugin. For example, if I wanted to write a plugin that validated that
all records continaed a valid schema ID, then it might only be
necessary to deserialize certain parts of the record - say the
headers, to perform this validation. With hindsight “need” isn’t a
great verb for this, so we’ve updated the KIP with “access” instead -
as per your suggestion.

HTH, regards
Edo & Adrian

Re: [DISCUSS] KIP-940: Broker extension point for validating record contents at produce time

Reply via email to