Hi Jorge thanks for the feedback. Comments inline below > 1. Similar to Kirk's first point, I'm also concerned on how would the > plugin developers / operators be able to apply multiple policies and how > configurations would be passed to each policy.
We’ve only attempted to tackle the “one plugin per broker” model with this KIP, as that’s the use-case we most clearly understand. Although, as noted in the rejected alternatives section, it would be possible to use a facade-like pattern to delegate from one plugin implementation to others. The reason we’ve avoided tackling multiple plugins is that it introduces further complexity (which takes precedence? Is configuration needed to say which plugin applies to which topic? Etc.), and we are concerned that without a clear use-case we might make decisions we later come to regret. Hopefully by offering minimal configuration options, we don’t hinder a future “support multiple record validation policies” KIP. > Some approaches from other plugins we can get some inspiration from: > > - AlterConfig, CreateTopic policies are designed to be 1 policy > implementing the different APIs. Up to the plugin developer to pull > policies together and configure it on the broker side. I guess for Record > Validation this may be cumbersome considering some Schema Registry > providers may want to offer implementations for their own backend. > > - Connect Transforms: here there's a named set of plugins to apply per > connector, and each transform has its own configuration defined by prefix. > Personally, I'd consider this one an interesting approach if we decide to > allow multiple record validations to be configured. > > - Tiered Storage (probably Connectors as well) have class-loader aware > implementations with class path specific to the plugin. Not sure if this is > something to discuss on the KIP or later on the PR, but we could mention > something on how this plugin would deal with dependency conflicts (e.g. > different jackson version between broker, plugin(s)). Thanks for highlighting all of these places where we can draw inspiration. We’ve updated the KIP with an additional classloader property to match the tiered storage implementation. It seems likely that record validation policy implementations will live in the codebase of their corresponding schema registry (as is the case, today, for the client serdes used to integrate with a schema registry) - so it makes sense to insulate their implementation from specific .jar versions that may (or may not) be present in a particular version of the broker. > Also, by potentially supporting multiple plugins for record validation, it > would be important to consider if it's an all or nothing relation, or > posible to choose _some_ policies apply per topic. > I see there's some preference for setting the validation policy name on the > topic, though this could be cumbersome to operate: topic creation users may > not be aware of the record validation (similar to CreateTopic/AlterConfig > policies) and would impose additional coordination. > Maybe a flag to whether apply policies or not would be a better approach? Could you elaborate more on your comments about “maybe a flag to whether to apply policies or not would be a better approach?”. We thought that setting the ‘record.validation.policy’ property on a topic to a value supported by the plugin was such a flag - but it sounds like you might have a different approach in mind? > 2. Have you consider adding the record metadata to the API? It may be > useful for logging purposes (e.g. if record validation fails, to log > topic-partition), or some policies are interested on record metadata (e.g. > compression, timestamp type, etc.) The topic/partition is available to the plugin via the TopicMetadata interace. Additional record properties could be added to the ‘RecordProxy’ interface, however the topic of how rich to make the interface was a sticking point for KIP-729. The intent behind the ‘TopicMetadata’ and ‘RecordProxy’ classes is that they can be extended in the future without breaking existing plugin implementations - so we’re not precluding further properties from being added if we’ve been too austere. > 3. A minor comment for consistency regarding the APIs. As far as I have > seen, we tend to use the name of the object returned directly instead of > getters notation, see `AlterConfigPolicy.RecordMetadata` [1]. We may rename > some of the proposed APIs accordingly: > > - `RecordProxy#headers()|key()|value()` > - `TopicMetadata#topicPartition()` Good point, and thanks for taking the time to find those examples. We’ve fixed the KIP to be consistent with this use of returned object name, and dropped the use of getter notation. > 4. For the `RecordIntrospectionHints`, I'm struggling to see how this may > be used by the policy developers. Would you mind adding some examples on > how the policy in general may be used? > Specifically, `long needKeyBytes|needKeyValue` are difficult to interpret > to me. > nit: maybe replace `need*` with `requiresAccess*` or simply `access*`, or > similar. The “hints” are intended to allow the broker to optimise what parts and how much of a record is deserialized before calling into the plugin. For example, if I wanted to write a plugin that validated that all records continaed a valid schema ID, then it might only be necessary to deserialize certain parts of the record - say the headers, to perform this validation. With hindsight “need” isn’t a great verb for this, so we’ve updated the KIP with “access” instead - as per your suggestion. HTH, regards Edo & Adrian