Thank you both for the replies! A couple more comments: On Tue, 27 Jun 2023 at 14:57, Edoardo Comar <edoardli...@gmail.com> wrote:
> Hi Jorge > thanks for the feedback. Comments inline below > > > 1. Similar to Kirk's first point, I'm also concerned on how would the > > plugin developers / operators be able to apply multiple policies and how > > configurations would be passed to each policy. > > We’ve only attempted to tackle the “one plugin per broker” model with > this KIP, as that’s the use-case we most clearly understand. Although, > as noted in the rejected alternatives section, it would be possible to > use a facade-like pattern to delegate from one plugin implementation > to others. The reason we’ve avoided tackling multiple plugins is that > it introduces further complexity (which takes precedence? Is > configuration needed to say which plugin applies to which topic? > Etc.), and we are concerned that without a clear use-case we might > make decisions we later come to regret. Hopefully by offering minimal > configuration options, we don’t hinder a future “support multiple > record validation policies” KIP. > Got it. Thanks! > > > Some approaches from other plugins we can get some inspiration from: > > > > - AlterConfig, CreateTopic policies are designed to be 1 policy > > implementing the different APIs. Up to the plugin developer to pull > > policies together and configure it on the broker side. I guess for Record > > Validation this may be cumbersome considering some Schema Registry > > providers may want to offer implementations for their own backend. > > > > - Connect Transforms: here there's a named set of plugins to apply per > > connector, and each transform has its own configuration defined by > prefix. > > Personally, I'd consider this one an interesting approach if we decide to > > allow multiple record validations to be configured. > > > > - Tiered Storage (probably Connectors as well) have class-loader aware > > implementations with class path specific to the plugin. Not sure if this > is > > something to discuss on the KIP or later on the PR, but we could mention > > something on how this plugin would deal with dependency conflicts (e.g. > > different jackson version between broker, plugin(s)). > > > Thanks for highlighting all of these places where we can draw > inspiration. We’ve updated the KIP with an additional classloader > property to match the tiered storage implementation. It seems likely > that record validation policy implementations will live in the > codebase of their corresponding schema registry (as is the case, > today, for the client serdes used to integrate with a schema registry) > - so it makes sense to insulate their implementation from specific > .jar versions that may (or may not) be present in a particular version > of the broker. > > > Also, by potentially supporting multiple plugins for record validation, > it > > would be important to consider if it's an all or nothing relation, or > > posible to choose _some_ policies apply per topic. > > I see there's some preference for setting the validation policy name on > the > > topic, though this could be cumbersome to operate: topic creation users > may > > not be aware of the record validation (similar to CreateTopic/AlterConfig > > policies) and would impose additional coordination. > > Maybe a flag to whether apply policies or not would be a better approach? > > Could you elaborate more on your comments about “maybe a flag to > whether to apply policies or not would be a better approach?”. We > thought that setting the ‘record.validation.policy’ property on a > topic to a value supported by the plugin was such a flag - but it > sounds like you might have a different approach in mind? > > The current proposal is to have ‘record.validation.policy’ per topic (default null). A flag would be something like ‘record.validation.policy.enable’ (default=false) may be simpler to configure from the user perspective. Also, at the moment, is a bit unclear to me what value the topic config ‘record.validation.policy’ should contain: is the policy class name? How is the policy expected to use the name received? > > 2. Have you consider adding the record metadata to the API? It may be > > useful for logging purposes (e.g. if record validation fails, to log > > topic-partition), or some policies are interested on record metadata > (e.g. > > compression, timestamp type, etc.) > > The topic/partition is available to the plugin via the TopicMetadata > interace. Additional record properties could be added to the > ‘RecordProxy’ interface, however the topic of how rich to make the > interface was a sticking point for KIP-729. The intent behind the > ‘TopicMetadata’ and ‘RecordProxy’ classes is that they can be extended > in the future without breaking existing plugin implementations - so > we’re not precluding further properties from being added if we’ve been > too austere. > I see, agree. > > > 3. A minor comment for consistency regarding the APIs. As far as I have > > seen, we tend to use the name of the object returned directly instead of > > getters notation, see `AlterConfigPolicy.RecordMetadata` [1]. We may > rename > > some of the proposed APIs accordingly: > > > > - `RecordProxy#headers()|key()|value()` > > - `TopicMetadata#topicPartition()` > > Good point, and thanks for taking the time to find those examples. > We’ve fixed the KIP to be consistent with this use of returned object > name, and dropped the use of getter notation. > > > 4. For the `RecordIntrospectionHints`, I'm struggling to see how this may > > be used by the policy developers. Would you mind adding some examples on > > how the policy in general may be used? > > Specifically, `long needKeyBytes|needKeyValue` are difficult to interpret > > to me. > > nit: maybe replace `need*` with `requiresAccess*` or simply `access*`, or > > similar. > > The “hints” are intended to allow the broker to optimise what parts > and how much of a record is deserialized before calling into the > plugin. For example, if I wanted to write a plugin that validated that > all records continaed a valid schema ID, then it might only be > necessary to deserialize certain parts of the record - say the > headers, to perform this validation. With hindsight “need” isn’t a > great verb for this, so we’ve updated the KIP with “access” instead - > as per your suggestion. > Thanks! I think adding a simple example of a Policy implementation and how plugin developer may use this hints (and metadata as well) may bring some clarity to the proposal. > > HTH, regards > Edo & Adrian >