gharris1727 commented on PR #16161: URL: https://github.com/apache/kafka/pull/16161#issuecomment-2147992314
> Can you expound on the motivation for this change? IIRC we began to allow null element schemas in order to support schema inference for empty collections parsed by the Values class, which is valid IMO. Framed in that way, I also think it would make sense to try to infer a schema for empty lists. But I think that decision is undesirable in the broader context of the Connect ecosystem. I tried to determine how a `null` schema is interpreted in the Connect type-system. When used in a ConnectRecord, it's a top-type, as it allows the Object value to be any type. The way Values uses it is also as a top-type, because a heterogeneous array infers a null schema. The current behavior for validateValue is as a bottom-type, as no value is valid for a null schema (although this seems to be an accident). Much of our infrastructure is not prepared for element schemas to be null, such as the SchemaBuilder#array() and SchemaBuilder#map() methods, which still have null guards: https://github.com/apache/kafka/blob/55d38efcc5505a5a1bddb08ba05f4d923f8050f9/connect/api/src/main/java/org/apache/kafka/connect/data/SchemaBuilder.java#L361-L383 The field builder has a null guard: https://github.com/apache/kafka/blob/55d38efcc5505a5a1bddb08ba05f4d923f8050f9/connect/api/src/main/java/org/apache/kafka/connect/data/SchemaBuilder.java#L326-L327 and obviously the validateValue method was written without null schemas in-mind. If we changed all of the above functions to remove the null guards, it would permit schemaless data within schema'd data. This would allow data sources to break the single schema vs schemaless check that most SMTs use to determine if they're operating on schema'd vs schemaless data. Rather than receiving DataExceptions when assembling the structs in the data sources, it would likely surface as NPEs in data sinks. So I don't think we can change validateValue to be top-type without risking more bad data and NPEs overall. I also don't think it's viable to make ConnectRecord bottom-type, as schemaless data is intentionally supported. I think the best path forward is to attempt to align with the SchemaBuilder semantics before the SchemaBuilder(Type) constructor was made public, and disallow null element schemas entirely. Then, the assumption that the root Schema in ConnectRecord should be the only nullable schema will be correct. Of course, we still have to change the Values class to be more conservative with the types it outputs, which I opened this ticket for: https://issues.apache.org/jira/browse/KAFKA-16870 . One of the good things for us is that the null-schema'd collections from the Values class and SimpleHeaderConverter don't commonly make it into the key or value, or as fields in a struct, because AK doesn't have HeaderTo$Key or HeaderTo$Value. This ticket actually isn't motivated by the Values parsing for that reason. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org