Hi Anton, The implicit defaults have been documented for a long time. Take a look at this documentation from 2019 in clients/src/main/resources/common/message/README.md :
> Deserializing Messages > ---------------------- > Message objects may be deserialized using the Message#read method. This > method > overwrites all the data currently in the message object with new data. > > Any fields in the message object that are not present in the version that you > are deserializing will be reset to default values. Unless a custom default > has > been set: > > * Integer fields default to 0. > > * Floats default to 0. > > * Booleans default to false. > > * Strings default to the empty string. > > * Bytes fields default to the empty byte array. > > * Uuid fields default to zero uuid. > > * Records fields default to null. > > * Array fields default to empty. > > You can specify "null" as a default value for a string field by specifying the > literal string "null". Note that you can only specify null as a default if > all > versions of the field are nullable. Any implementation of sending and receiving Kafka messages needs to know about these defaults. It is, as you said, "a quirk of the implementation" but it's a quirk that you have to know to speak the Kafka protocol. The defaults apply to both tagged and untagged fields. The main difference is that untagged fields are always present except in older versions, whereas tagged fields may not be present even in the most recent version. Perhaps there is something we could do to improve these docs? Or somehow make them easier to find? best, Colin On Wed, Oct 23, 2024, at 00:19, Anton Agestam wrote: > Hi Colin, > > I understand your perspective that this is a bug in the client, but I > disagree that this is something that's established, or inferrable from > existing specification and implementation. > > The behavior of exposing the implicit defaults on the wire protocol really > is new. The only scenario that they have come into play prior to this > change is when parsing an entity version that is not the latest version > that the parsing code base supports -- and -- when that code base models > all versions of an API with the same entity. In this scenario the implicit > defaults must be there to fill in newly added fields with _some_ value in > lack of being able to read from the message. However, for an implementation > that doesn't reuse the same class for every version, there has never been > any need to use the implicit defaults, as when each version has a dedicated > class it doesn't need to define any fields for values that are added in > future versions. We have a comprehensive compatibility test suite in kio > that proves this is the case, we can parse and serialize the full protocol > up to version 3.8.0 (every version of every message). This would not have > been possible if the implicit defaults were semantically required, because > we do not use them. > > The reason that the above hasn't become visible until now, as I mentioned > in the bug report, is that the newly introduced protocol message is the > first one (of all message versions in the whole protocol) to make use of > this construct: a nested entity field where not all nested fields have an > explicit default value. > > Further, the new behavior is not in accordance with what is specified. > The > implicit defaults are really only documented to be used in the above > mentioned scenario, with the following prerequisite > <https://github.com/apache/kafka/tree/trunk/clients/src/main/resources/common/message#deserializing-messages>: > "Any fields in the message object that are not present in the version > that > you are deserializing will be reset to default values". That > prerequisite > is not fulfilled here, so this cannot be said to be in accordance with > what > is specified. The behavior occurs even though I am parsing a message of > the > _same_, latest, version as the schema of the model. > > In summary, this is new behavior, and it is unspecified behavior. > > On top of that, I think this is also bad API design, as it forces > cumbersome and strange semantics into client implementations. In the > context of parsing a model that has more fields than the data it is > parsing, it makes sense to have implicit defaults. However, when the model > has the same amount of fields as there are values in the data, this does > not make sense. Why should we force client implementations to induce non > defaults into their data structures? The empty string for a hostname, and > zero for a port are clearly not good defaults, both are really invalid > values. > > The choice of implicitly introducing this behavior at the protocol level in > this way will most likely mean that this becomes a quirk of the protocol > forever. I think this situation is avoidable, and I think this is a > decision that should not be made silently without proper specification and > design process to make sure it is really the right decision. > > BR, > Anton > > Den mån 21 okt. 2024 kl 23:12 skrev Colin McCabe <cmcc...@apache.org>: > >> Hi all, >> >> I have posted a new release candidate, RC3. See the RC3 thread. >> >> best, >> Colin >> >> On Mon, Oct 21, 2024, at 11:31, Colin McCabe wrote: >> > Hi Anton, >> > >> > I replied on the JIRA. I do not think this is a bug, you just failed to >> > account for implicit defaults in your protocol code. That is, 0 is the >> > default of numeric fields if no other default is specified, etc. >> > >> > best, >> > Colin >> > >> > On Mon, Oct 21, 2024, at 08:07, Anton Agestam wrote: >> >> Hi everyone, >> >> >> >> I have found a protocol serialization bug that surfaces only with one of >> >> the entities introduced for KIP-853 (UpdateRaftVoterResponse). >> >> >> >> Due to the irreversible implications this might have once merged, I'd >> argue >> >> that this needs to be considered a release blocker. >> >> >> >> https://issues.apache.org/jira/browse/KAFKA-17845 >> >> >> >> BR, >> >> Anton >> >> >> >> Den tors 10 okt. 2024 kl 23:16 skrev Colin McCabe <cmcc...@apache.org>: >> >> >> >>> This is the second candidate for the release of Apache Kafka 3.9.0. I >> have >> >>> titled it rc2 since I had an rc1 which got very far, even to the point >> of >> >>> pushing tags and docker images, before I spotted an issue. So rather >> than >> >>> mutate the tags, I decided to skip over rc1. >> >>> >> >>> - This is a major release, the final one in the 3.x line. (There may of >> >>> course be other minor releases in this line, such as 3.9.1.) >> >>> - Tiered storage will be considered production-ready in this release. >> >>> - This will be the final major release to feature the deprecated >> ZooKeeper >> >>> mode. >> >>> >> >>> This release includes the following KIPs: >> >>> - KIP-853: Support dynamically changing KRaft controller membership >> >>> - KIP-1057: Add remote log metadata flag to the dump log tool >> >>> - KIP-1049: Add config log.summary.interval.ms to Kafka Streams >> >>> - KIP-1040: Improve handling of nullable values in InsertField, >> >>> ExtractField, and other transformations >> >>> - KIP-1031: Control offset translation in MirrorSourceConnector >> >>> - KIP-1033: Add Kafka Streams exception handler for exceptions >> occurring >> >>> during processing >> >>> - KIP-1017: Health check endpoint for Kafka Connect >> >>> - KIP-1025: Optionally URL-encode clientID and clientSecret in >> >>> authorization header >> >>> - KIP-1005: Expose EarliestLocalOffset and TieredOffset >> >>> - KIP-950: Tiered Storage Disablement >> >>> - KIP-956: Tiered Storage Quotas >> >>> >> >>> Release notes for the 3.9.0 release: >> >>> >> https://dist.apache.org/repos/dist/dev/kafka/3.9.0-rc2/RELEASE_NOTES.html >> >>> >> >>> *** Please download, test and vote by October 16, 2024. >> >>> >> >>> Kafka's KEYS file containing PGP keys we use to sign the release: >> >>> https://kafka.apache.org/KEYS >> >>> >> >>> * Release artifacts to be voted upon (source and binary): >> >>> https://dist.apache.org/repos/dist/dev/kafka/3.9.0-rc2/ >> >>> >> >>> * Docker release artifacts to be voted upon: >> >>> apache/kafka:3.9.0-rc2 >> >>> apache/kafka-native:3.9.0-rc2 >> >>> >> >>> * Maven artifacts to be voted upon: >> >>> https://repository.apache.org/content/groups/staging/org/apache/kafka/ >> >>> >> >>> * Javadoc: >> >>> https://dist.apache.org/repos/dist/dev/kafka/3.9.0-rc2/javadoc/ >> >>> >> >>> * Documentation: >> >>> https://kafka.apache.org/39/documentation.html >> >>> >> >>> * Protocol: >> >>> https://kafka.apache.org/39/protocol.html >> >>> >> >>> * Tag to be voted upon (off 3.9 branch) is the 3.9.0-rc2 tag: >> >>> https://github.com/apache/kafka/releases/tag/3.9.0-rc2 >> >>> >> >>> * Successful Docker Image Github Actions Pipeline for 3.9 branch: >> >>> Docker Build Test Pipeline (JVM): >> >>> https://github.com/apache/kafka/actions/runs/11281563007 >> >>> Docker Build Test Pipeline (Native): >> >>> https://github.com/apache/kafka/actions/runs/11281608809 >> >>> >> >>> Thanks to everyone who helped with this release candidate, either by >> >>> contributing code, testing, or documentation. >> >>> >> >>> Regards, >> >>> Colin >> >>> >>