On Sun, Oct 27, 2024, at 01:44, Anton Agestam wrote:
> Colin
>
> I have presented four reasons, I'll list them again below. Please let me
> know which ones there isn't already enough information on the thread
> already.
>
> - The behavior is new.

Hi Anton,

This behavior isn't new. I gave an example of tagged fields that have an 
implicit default in 3.8 and earlier.

> - The behavior is undocumented.

It seems like we both agree that the implicit defaults are documented. I showed 
you where in the README.md they are discussed. That section is from 2019. 
Perhaps the disagreement is that you assumed that they didn't apply to tagged 
fields, whereas I assumed that it was obvious that they did.

It looks like Chia-Ping Tsai has opened a JIRA to clarify that implicit 
defaults do indeed apply to tagged fields. I think this will help avoid 
confusion in the future.

> - The behavior is bad API design.

Why is it bad API design?

> - The behavior does not really save bytes *in practice*.

The example you gave shows that the current behavior sends less over the wire 
than your proposed change. Those are not theoretical bytes, they are actual 
bytes.

Saving space on the wire for fields that were not often used was one of the 
explicit goals of the tagged fields KIP, which was KIP-482. As it says in the 
"motivation" section:

 > While [the current] versioning scheme allows us to change the message 
 > schemas over 
 > time, there are many scenarios that it doesn't support well.  One scenario 
 > that isn't well-supported is when we have data that should be sent in some 
 > contexts, but not others.  For example, when a MetadataRequest is made 
 > with IncludeClusterAuthorizedOperations set to true, we need to include 
 > the authorized operations in the response.  However, even when 
 > IncludeClusterAuthorizedOperations is set to false, we still must waste 
 > bandwidth sending a set of blank authorized operations fields in the 
 > response.  The problem is that the field that is semantically optional in 
 > the message, but that is can't be expressed in the type system for the 
 > Kafka RPC protocol.

You can read it here: https://cwiki.apache.org/confluence/x/OhMyBw

Obviously sending defaults over the wire, in cases where this is not needed, 
goes against that.

>
> I don't see why *fixing* the release candidate to not break documented
> behavior should require a KIP, I would actually expect the opposite -- the
> new behavior that is being introduced should really have required one.
>
>> These two behaviors, taken together, save space on the wire
>
> Then you are implicitly arguing that the combination of host="" port=0
> are common enough that this will practically save bytes on the wire, I find
> that hard to believe.
>
> For any future schema that we want to save bytes, there is just as much
> opportunity to save bytes on the wire with my proposal, they just have to
> explicitly define default nested values in order to do so.

As I said, there is nothing special about 3.9. This behavior has always existed.

If you really want to force everyone to explicitly declare a default for each 
field, then just introduce a KIP to do that. I wouldn't vote for it (I still 
don't see why this is better), but this would at least follow our usual process.

One of the problems with forcing an explicit default everywhere is that we 
don't really have a syntax for specifying that the default should be the empty 
collection. For collections, the only choice you get is explicitly declaring 
that the default is null.

best,
Colin


>
> BR,
> Anton
>
>
> Den sön 27 okt. 2024 kl 02:56 skrev Colin McCabe <cmcc...@apache.org>:
>
>> Hi Anton,
>>
>> The behavior where we leave out tagged fields when the default value is
>> present is intentional. As is the behavior where default values are treated
>> as 0, the empty array, etc. when defaults are not explicitly specified.
>> These two behaviors, taken together, save space on the wire, and are
>> simpler for us to implement than what you have proposed. You haven't
>> presented a single reason why this should change.
>>
>> There simply isn't any reason to change the current tagged fields
>> behavior. And even if we wanted to, it would require a KIP, whereas 3.9 is
>> past KIP freeze (along with feature freeze, code freeze, and every other
>> freeze.)
>>
>> best,
>> Colin
>>
>>
>> On Sat, Oct 26, 2024, at 09:45, Anton Agestam wrote:
>> > Hi Colin,
>> >
>> >> The current behavior is older than 3.9
>> >
>> > No it is not. The *Java implementation* is older than 3.9. But the
>> behavior
>> > is a sum of Java implementation + JSON schema. It's a new construct of
>> JSON
>> > schema that exposes the new behavior. It is fully possible to adjust
>> > behavior only for the new kind of construct and remain fully equivalent
>> in
>> > behavior with 3.8 for everything else.
>> >
>> >> a lot of things depend on it
>> >
>> > Do you have an example of this? The field you pointed at is an array
>> field,
>> > so it doesn't exercise the behavior we are talking about here.
>> >
>> >> I also disagree that the implicit default behavior is bad.
>> >
>> > This is a mischaracterization of my previous messages. I am NOT proposing
>> > to change any of the implicit default behavior that Kafka 3.8 exercises.
>> > You are not disagreeing with me because I have not stated that the
>> implicit
>> > default behavior in general is what is bad.
>> >
>> >> we might have to agree to disagree about
>> >
>> > That's always an acceptable outcome, however, I think it would be very
>> sad
>> > to see this decision made without a clear understanding of the problem.
>> >
>> > This proof of concept pull request demonstrates that it is possible to do
>> > what I am suggesting: https://github.com/apache/kafka/pull/17608
>> >
>> > BR,
>> > Anton
>> >
>> > Den fre 25 okt. 2024 kl 23:32 skrev Colin McCabe <cmcc...@apache.org>:
>> >
>> >> Hi Anton,
>> >>
>> >> I appreciate your desire for clarity in the specification. If you think
>> >> there's something we could do to clarify the docs, let me know (or
>> better
>> >> yet, file a PR!)
>> >>
>> >> But we cannot change the behavior of defaults for tagged fields. The
>> >> current behavior is older than 3.9 and a lot of things depend on it. For
>> >> example, if you go back to 3.8, you'll find a tagged field in
>> >> ApiVersionsResponse.json named SupportedFeatures. It doesn't specify an
>> >> explicit default. Therefore it gets the implicit default of the empty
>> array
>> >> if nothing is sent for this field. (I don't think this field was new in
>> >> 3.8, either, but I'm just giving a random example).
>> >>
>> >> I also disagree that the implicit default behavior is bad. Forcing
>> >> everyone to specify a default for every field would have been verbose,
>> and
>> >> in most cases not necessary. While there are some things I wish I had
>> done
>> >> differently about the protocol generator, this isn't one of them. Maybe
>> >> this is a matter of opinion, though, that we might have to agree to
>> >> disagree about...
>> >>
>> >> best,
>> >> Colin
>> >>
>> >>
>> >> On Fri, Oct 25, 2024, at 01:37, Anton Agestam wrote:
>> >> > Hi again Colin,
>> >> >
>> >> >> The implicit defaults have been documented for a long time
>> >> >
>> >> > Yes, if you read my message you will see I am referencing and quoting
>> >> that
>> >> > exact documentation. The new behavior is clearly excluded by the
>> explicit
>> >> > pre requisite in that exact documentation.
>> >> >
>> >> > Again in summary (all three of these points have more details in my
>> >> > previous message, so I'll refrain from repeating too much):
>> >> >
>> >> > - This is new behavior. There was no nested tagged entity field with
>> >> > non-explicit defaults in any API schema version up until 3.9.
>> >> > - The new behavior is _not_ specified in documentation. The existing
>> >> > documentation is specific under which circumstances it applies.
>> >> > - The new behavior is bad.
>> >> >
>> >> >> It is, as you said, "a quirk of the implementation" but it's a quirk
>> >> that
>> >> > you have to know to speak the Kafka protocol
>> >> >
>> >> > This is somewhat paraphrasing. It is a *quirk of the implementation
>> >> > introduced in 3.9*, without any formalization of the change, and with
>> >> > consequences that will be extremely cumbersome to undo. Now is the
>> only
>> >> > time to fix this without introducing backwards incompatible semantics.
>> >> >
>> >> > In terms of specification, this is the addition that I am proposing:
>> >> > "nested tagged entity fields are only to be considered as having an
>> >> > implicit default value if all of fields in the nested entity also have
>> >> > explicit defaults". This would make no difference for any message
>> prior
>> >> to
>> >> > 3.9. On 3.9, it will only have an effect on UpdateRaftVoterResponse,
>> >> where,
>> >> > if you want to send a CurrentLeader object that matches the implicit
>> >> > defaults, this will have to be serialized fully on the wire. Do you
>> see
>> >> any
>> >> > concrete consequences of that change?
>> >> >
>> >> > As an alternative that would not require improving the protocol
>> >> > specification, the currentLeader field could be made optional and have
>> >> null
>> >> > as an explicit default. I think it's a worse solution, but it defers
>> >> having
>> >> > to improve the specification.
>> >> >
>> >> >> Perhaps there is something we could do to improve these docs? Or
>> somehow
>> >> > make them easier to find?
>> >> >
>> >> > Yes, I'm happy to discuss this, but for the sake of keeping this
>> >> discussion
>> >> > on topic, can we do that separately? There are some related Jira
>> tickets
>> >> > for this.
>> >> >
>> >> > Cheers,
>> >> > Anton
>> >> >
>> >> >
>> >> > Den tors 24 okt. 2024 kl 18:31 skrev Colin McCabe <cmcc...@apache.org
>> >:
>> >> >
>> >> >> Hi Anton,
>> >> >>
>> >> >> The implicit defaults have been documented for a long time. Take a
>> look
>> >> at
>> >> >> this documentation from 2019 in
>> >> >> clients/src/main/resources/common/message/README.md :
>> >> >>
>> >> >> > Deserializing Messages
>> >> >> > ----------------------
>> >> >> > Message objects may be deserialized using the Message#read method.
>> >> This
>> >> >> method
>> >> >> > overwrites all the data currently in the message object with new
>> data.
>> >> >> >
>> >> >> > Any fields in the message object that are not present in the
>> version
>> >> >> that you
>> >> >> > are deserializing will be reset to default values.  Unless a custom
>> >> >> default has
>> >> >> > been set:
>> >> >> >
>> >> >> > * Integer fields default to 0.
>> >> >> >
>> >> >> > * Floats default to 0.
>> >> >> >
>> >> >> > * Booleans default to false.
>> >> >> >
>> >> >> > * Strings default to the empty string.
>> >> >> >
>> >> >> > * Bytes fields default to the empty byte array.
>> >> >> >
>> >> >> > * Uuid fields default to zero uuid.
>> >> >> >
>> >> >> > * Records fields default to null.
>> >> >> >
>> >> >> > * Array fields default to empty.
>> >> >> >
>> >> >> > You can specify "null" as a default value for a string field by
>> >> >> specifying the
>> >> >> > literal string "null".  Note that you can only specify null as a
>> >> default
>> >> >> if all
>> >> >> > versions of the field are nullable.
>> >> >>
>> >> >> Any implementation of sending and receiving Kafka messages needs to
>> know
>> >> >> about these defaults. It is, as you said, "a quirk of the
>> >> implementation"
>> >> >> but it's a quirk that you have to know to speak the Kafka protocol.
>> The
>> >> >> defaults apply to both tagged and untagged fields. The main
>> difference
>> >> is
>> >> >> that untagged fields are always present except in older versions,
>> >> whereas
>> >> >> tagged fields may not be present even in the most recent version.
>> >> >>
>> >> >> Perhaps there is something we could do to improve these docs? Or
>> somehow
>> >> >> make them easier to find?
>> >> >>
>> >> >> best,
>> >> >> Colin
>> >> >>
>> >> >>
>> >> >> On Wed, Oct 23, 2024, at 00:19, Anton Agestam wrote:
>> >> >> > Hi Colin,
>> >> >> >
>> >> >> > I understand your perspective that this is a bug in the client,
>> but I
>> >> >> > disagree that this is something that's established, or inferrable
>> from
>> >> >> > existing specification and implementation.
>> >> >> >
>> >> >> > The behavior of exposing the implicit defaults on the wire protocol
>> >> >> really
>> >> >> > is new. The only scenario that they have come into play prior to
>> this
>> >> >> > change is when parsing an entity version that is not the latest
>> >> version
>> >> >> > that the parsing code base supports -- and -- when that code base
>> >> models
>> >> >> > all versions of an API with the same entity. In this scenario the
>> >> >> implicit
>> >> >> > defaults must be there to fill in newly added fields with _some_
>> >> value in
>> >> >> > lack of being able to read from the message. However, for an
>> >> >> implementation
>> >> >> > that doesn't reuse the same class for every version, there has
>> never
>> >> been
>> >> >> > any need to use the implicit defaults, as when each version has a
>> >> >> dedicated
>> >> >> > class it doesn't need to define any fields for values that are
>> added
>> >> in
>> >> >> > future versions. We have a comprehensive compatibility test suite
>> in
>> >> kio
>> >> >> > that proves this is the case, we can parse and serialize the full
>> >> >> protocol
>> >> >> > up to version 3.8.0 (every version of every message). This would
>> not
>> >> have
>> >> >> > been possible if the implicit defaults were semantically required,
>> >> >> because
>> >> >> > we do not use them.
>> >> >> >
>> >> >> > The reason that the above hasn't become visible until now, as I
>> >> mentioned
>> >> >> > in the bug report, is that the newly introduced protocol message is
>> >> the
>> >> >> > first one (of all message versions in the whole protocol) to make
>> use
>> >> of
>> >> >> > this construct: a nested entity field where not all nested fields
>> >> have an
>> >> >> > explicit default value.
>> >> >> >
>> >> >> > Further, the new behavior is not in accordance with what is
>> specified.
>> >> >> > The
>> >> >> > implicit defaults are really only documented to be used in the
>> above
>> >> >> > mentioned scenario, with the following prerequisite
>> >> >> > <
>> >> >>
>> >>
>> https://github.com/apache/kafka/tree/trunk/clients/src/main/resources/common/message#deserializing-messages
>> >> >> >:
>> >> >> > "Any fields in the message object that are not present in the
>> version
>> >> >> > that
>> >> >> > you are deserializing will be reset to default values". That
>> >> >> > prerequisite
>> >> >> > is not fulfilled here, so this cannot be said to be in accordance
>> with
>> >> >> > what
>> >> >> > is specified. The behavior occurs even though I am parsing a
>> message
>> >> of
>> >> >> > the
>> >> >> > _same_, latest, version as the schema of the model.
>> >> >> >
>> >> >> > In summary, this is new behavior, and it is unspecified behavior.
>> >> >> >
>> >> >> > On top of that, I think this is also bad API design, as it forces
>> >> >> > cumbersome and strange semantics into client implementations. In
>> the
>> >> >> > context of parsing a model that has more fields than the data it is
>> >> >> > parsing, it makes sense to have implicit defaults. However, when
>> the
>> >> >> model
>> >> >> > has the same amount of fields as there are values in the data, this
>> >> does
>> >> >> > not make sense. Why should we force client implementations to
>> induce
>> >> non
>> >> >> > defaults into their data structures? The empty string for a
>> hostname,
>> >> and
>> >> >> > zero for a port are clearly not good defaults, both are really
>> invalid
>> >> >> > values.
>> >> >> >
>> >> >> > The choice of implicitly introducing this behavior at the protocol
>> >> level
>> >> >> in
>> >> >> > this way will most likely mean that this becomes a quirk of the
>> >> protocol
>> >> >> > forever. I think this situation is avoidable, and I think this is a
>> >> >> > decision that should not be made silently without proper
>> specification
>> >> >> and
>> >> >> > design process to make sure it is really the right decision.
>> >> >> >
>> >> >> > BR,
>> >> >> > Anton
>> >> >> >
>> >> >> > Den mån 21 okt. 2024 kl 23:12 skrev Colin McCabe <
>> cmcc...@apache.org
>> >> >:
>> >> >> >
>> >> >> >> Hi all,
>> >> >> >>
>> >> >> >> I have posted a new release candidate, RC3. See the RC3 thread.
>> >> >> >>
>> >> >> >> best,
>> >> >> >> Colin
>> >> >> >>
>> >> >> >> On Mon, Oct 21, 2024, at 11:31, Colin McCabe wrote:
>> >> >> >> > Hi Anton,
>> >> >> >> >
>> >> >> >> > I replied on the JIRA. I do not think this is a bug, you just
>> >> failed
>> >> >> to
>> >> >> >> > account for implicit defaults in your protocol code. That is, 0
>> is
>> >> the
>> >> >> >> > default of numeric fields if no other default is specified, etc.
>> >> >> >> >
>> >> >> >> > best,
>> >> >> >> > Colin
>> >> >> >> >
>> >> >> >> > On Mon, Oct 21, 2024, at 08:07, Anton Agestam wrote:
>> >> >> >> >> Hi everyone,
>> >> >> >> >>
>> >> >> >> >> I have found a protocol serialization bug that surfaces only
>> with
>> >> >> one of
>> >> >> >> >> the entities introduced for KIP-853 (UpdateRaftVoterResponse).
>> >> >> >> >>
>> >> >> >> >> Due to the irreversible implications this might have once
>> merged,
>> >> I'd
>> >> >> >> argue
>> >> >> >> >> that this needs to be considered a release blocker.
>> >> >> >> >>
>> >> >> >> >> https://issues.apache.org/jira/browse/KAFKA-17845
>> >> >> >> >>
>> >> >> >> >> BR,
>> >> >> >> >> Anton
>> >> >> >> >>
>> >> >> >> >> Den tors 10 okt. 2024 kl 23:16 skrev Colin McCabe <
>> >> >> cmcc...@apache.org>:
>> >> >> >> >>
>> >> >> >> >>> This is the second candidate for the release of Apache Kafka
>> >> 3.9.0.
>> >> >> I
>> >> >> >> have
>> >> >> >> >>> titled it rc2 since I had an rc1 which got very far, even to
>> the
>> >> >> point
>> >> >> >> of
>> >> >> >> >>> pushing tags and docker images, before I spotted an issue. So
>> >> rather
>> >> >> >> than
>> >> >> >> >>> mutate the tags, I decided to skip over rc1.
>> >> >> >> >>>
>> >> >> >> >>> - This is a major release, the final one in the 3.x line.
>> (There
>> >> >> may of
>> >> >> >> >>> course be other minor releases in this line, such as 3.9.1.)
>> >> >> >> >>> - Tiered storage will be considered production-ready in this
>> >> >> release.
>> >> >> >> >>> - This will be the final major release to feature the
>> deprecated
>> >> >> >> ZooKeeper
>> >> >> >> >>> mode.
>> >> >> >> >>>
>> >> >> >> >>> This release includes the following KIPs:
>> >> >> >> >>> - KIP-853: Support dynamically changing KRaft controller
>> >> membership
>> >> >> >> >>> - KIP-1057: Add remote log metadata flag to the dump log tool
>> >> >> >> >>> - KIP-1049: Add config log.summary.interval.ms to Kafka
>> Streams
>> >> >> >> >>> - KIP-1040: Improve handling of nullable values in
>> InsertField,
>> >> >> >> >>> ExtractField, and other transformations
>> >> >> >> >>> - KIP-1031: Control offset translation in
>> MirrorSourceConnector
>> >> >> >> >>> - KIP-1033: Add Kafka Streams exception handler for exceptions
>> >> >> >> occurring
>> >> >> >> >>> during processing
>> >> >> >> >>> - KIP-1017: Health check endpoint for Kafka Connect
>> >> >> >> >>> - KIP-1025: Optionally URL-encode clientID and clientSecret in
>> >> >> >> >>> authorization header
>> >> >> >> >>> - KIP-1005: Expose EarliestLocalOffset and TieredOffset
>> >> >> >> >>> - KIP-950: Tiered Storage Disablement
>> >> >> >> >>> - KIP-956: Tiered Storage Quotas
>> >> >> >> >>>
>> >> >> >> >>> Release notes for the 3.9.0 release:
>> >> >> >> >>>
>> >> >> >>
>> >> >>
>> >>
>> https://dist.apache.org/repos/dist/dev/kafka/3.9.0-rc2/RELEASE_NOTES.html
>> >> >> >> >>>
>> >> >> >> >>> *** Please download, test and vote by October 16, 2024.
>> >> >> >> >>>
>> >> >> >> >>> Kafka's KEYS file containing PGP keys we use to sign the
>> release:
>> >> >> >> >>> https://kafka.apache.org/KEYS
>> >> >> >> >>>
>> >> >> >> >>> * Release artifacts to be voted upon (source and binary):
>> >> >> >> >>> https://dist.apache.org/repos/dist/dev/kafka/3.9.0-rc2/
>> >> >> >> >>>
>> >> >> >> >>> * Docker release artifacts to be voted upon:
>> >> >> >> >>> apache/kafka:3.9.0-rc2
>> >> >> >> >>> apache/kafka-native:3.9.0-rc2
>> >> >> >> >>>
>> >> >> >> >>> * Maven artifacts to be voted upon:
>> >> >> >> >>>
>> >> >>
>> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>> >> >> >> >>>
>> >> >> >> >>> * Javadoc:
>> >> >> >> >>>
>> https://dist.apache.org/repos/dist/dev/kafka/3.9.0-rc2/javadoc/
>> >> >> >> >>>
>> >> >> >> >>> * Documentation:
>> >> >> >> >>> https://kafka.apache.org/39/documentation.html
>> >> >> >> >>>
>> >> >> >> >>> * Protocol:
>> >> >> >> >>> https://kafka.apache.org/39/protocol.html
>> >> >> >> >>>
>> >> >> >> >>> * Tag to be voted upon (off 3.9 branch) is the 3.9.0-rc2 tag:
>> >> >> >> >>> https://github.com/apache/kafka/releases/tag/3.9.0-rc2
>> >> >> >> >>>
>> >> >> >> >>> * Successful Docker Image Github Actions Pipeline for 3.9
>> branch:
>> >> >> >> >>> Docker Build Test Pipeline (JVM):
>> >> >> >> >>> https://github.com/apache/kafka/actions/runs/11281563007
>> >> >> >> >>> Docker Build Test Pipeline (Native):
>> >> >> >> >>> https://github.com/apache/kafka/actions/runs/11281608809
>> >> >> >> >>>
>> >> >> >> >>> Thanks to everyone who helped with this release candidate,
>> >> either by
>> >> >> >> >>> contributing code, testing, or documentation.
>> >> >> >> >>>
>> >> >> >> >>> Regards,
>> >> >> >> >>> Colin
>> >> >> >> >>>
>> >> >> >>
>> >> >>
>> >>
>>

Reply via email to