While messing around with the release candidate, I found a separate blocker 
bug: KAFKA-17794.

Basically, our behavior is confusing when we format the storage directory with 
KIP-853 (dynamic controller quorum), but no initial controllers are supplied. I 
have a PR which requires the initial controllers to be initialized in this case 
(since otherwise, it does not work).

best,
Colin

On Mon, Oct 14, 2024, at 16:34, Colin McCabe wrote:
> On Mon, Oct 14, 2024, at 15:04, Jakub Scholz wrote:
>> Hi Colin,
>>
>> So, how exactly does this major misconfiguration that was not documented
>> for over a year and nobody complained manifest itself? What should I look
>> for in the logs? What are the problems it manifests itself through? There
>> are plenty of users who went through the migration with this "major
>> misconfiguration". So what should they be looking for? Did they lose
>> messages? Does it have security consequences that require a CVE?  How
>> does one recover from the problems it caused?
>>
>> I do not think changing the controller name is as simple as you suggest.
>> And it is especially not simple if you need to roll it out to thousands of
>> users.
>>
>> I also feel that issues like this need to be taken more seriously than just
>> shrugging it off and quickly closing JIRA someone else opened as invalid.
>> This is not the first thing that was found as undocumented. It indicates
>> significant quality issues, at least on the documentation side. How many
>> other "major misconfigurations" are still left?
>
> Hi Jakub,
>
> I understand your frustration. Let me explain. Kafka wants to divide up 
> all listeners into either broker listeners or controller listeners. The 
> sets are disjoint: a listener can't be both.
>
> BrokerServer will try to open the ports that belong to the broker; 
> ControllerServer will try to open the ports that belong to the 
> controller. You obviously can't open the same port twice (in standard 
> UNIX, at least) or if you somehow did, the result would be nonsense. 
> This isn't such a big deal on broker or controller nodes, but it 
> becomes a bigger deal when in "combined" mode, where a process 
> functions as both broker and controller.
>
> You are getting the error message "requirement failed: 
> control.plane.listener.name must be a listener name defined in 
> advertised.listeners" because the broker is looking for a listener 
> named CONTROLPLANE-9090, but it already knows that this is NOT an 
> advertised listener for the broker, it is an advertised listener for 
> the controller. You would get the same error if you tried to set the 
> inter.broker.listener or the broker replication listener to the 
> CONTROLLER listener.
>
> The reason why this wasn't an issue in 3.8.0, and is in 3.9-RC2 is that 
> in 3.8.0 and earlier, advertised.listeners was just for broker 
> listeners. There was never a need to put a controller listener in there 
> because controller listeners were statically configured by 
> controller.quorum.voters. In fact, when in KRaft mode, it would be a 
> fatal configuration error to put a controller listener into 
> advertised.listeners. Unfortunately, we failed to enforce this when the 
> broker was in migration mode.
>
> I guess for the purpose of being bug-compatible with 3.8, we could make 
> an exception here and force the listener specified by 
> control.plane.listener to appear in effectiveAdvertisedBrokerListeners. 
> Since control plane listeners are going away anyway, we won't have to 
> support this exception for very long.
>
> By the way, I wasn't trying to be dismissive of your bug report (in 
> fact I spent several hours on it). I felt (in fact I still feel) that 
> it's a configuration error. But your compatibility argument is a 
> reasonable one. So let's be compatible.
>
> best,
> Colin
>
>>
>> Jakub
>>
>> On Mon, Oct 14, 2024 at 11:33 PM Colin McCabe <cmcc...@apache.org> wrote:
>>
>>> Hi Jakub,
>>>
>>> It has always been required to separate control plane listeners and
>>> controller listeners. Failing to do this is a major misconfiguration. It
>>> may not have been caught sometimes, but that is a bug.
>>>
>>> It should be simple to fix the configuration you posted -- simply have a
>>> different name for the controller listener than the control plane listener.
>>>
>>> best,
>>> Colin
>>>
>>> On Mon, Oct 14, 2024, at 11:16, Jakub Scholz wrote:
>>> > The different name of the controller listener for KRaft controllers and
>>> > control plane listener in ZooKeeper-based cluster was not required before
>>> > and it is not simple to change to handle now at the "last minute". So
>>> given
>>> > that this is called production-ready already for some time, I think this
>>> is
>>> > breaking API change and should be treated as such.
>>> >
>>> > Thanks & Regards
>>> > Jakub
>>> >
>>> > On Mon, Oct 14, 2024 at 7:55 PM Colin McCabe <cmcc...@apache.org> wrote:
>>> >
>>> >> Hi Jakub,
>>> >>
>>> >> After looking through the attached file on the JIRA, I can say that this
>>> >> is a misconfiguration. control.plane.listener is a totally separate
>>> concept
>>> >> from control.plane.listener.name. They should never be set to the same
>>> >> value. The controller listener must have a different name and value than
>>> >> the control plane listener (if any).
>>> >>
>>> >> I also tested myself that KRaft migration works with
>>> >> control.plane.listener configured. It works on both 3.8 and 3.9-RC2.
>>> >>
>>> >> My initial statement that control.plane.listener was not supported
>>> during
>>> >> ZK migration was incorrect. As you said, it is supported during
>>> migration
>>> >> up to the point that we are in KRaft mode. (Another reason why having
>>> the
>>> >> control plane listener = controller listener would not make sense.)
>>> >>
>>> >> Thanks for the bug report and discussion. I've closed this as invalid
>>> now
>>> >> that I have tested migration using control.plane.listener for myself and
>>> >> verified that it works.
>>> >>
>>> >> best,
>>> >> Colin
>>> >>
>>> >> On Mon, Oct 14, 2024, at 08:31, Jakub Scholz wrote:
>>> >> >> control.plane.listener is not (and never has been) supported in KRaft
>>> >> > mode.
>>> >> >
>>> >> > You mean control.plane.listener.name is not supported in KRaft I
>>> guess?
>>> >> > Well, this is not KRaft, this is migration, so it uses the settings
>>> that
>>> >> it
>>> >> > used before for the Zoo-based cluster and that includes using
>>> dedicated
>>> >> > control plane listener. I don't think I can "just remove it" because
>>> the
>>> >> > other nodes will use it during the rolling update.
>>> >> >
>>> >> > This also worked fine with 3.8 (and 3.7, etc.) -> so if it is not
>>> >> supported
>>> >> > now, it is a breaking API change I guess which should be a blocker.
>>> >> >
>>> >> > Thanks & Regards
>>> >> > Jakub
>>> >> >
>>> >> > On Mon, Oct 14, 2024 at 5:12 PM Colin McCabe <cmcc...@apache.org>
>>> wrote:
>>> >> >
>>> >> >> Hi Jakub,
>>> >> >>
>>> >> >> Thanks for testing. control.plane.listener is not (and never has
>>> been)
>>> >> >> supported in KRaft mode. You have to remove control.plane.listener
>>> >> >> configurations before migrating. I filed KAFKA-17790 to document
>>> this in
>>> >> >> the migration instructions. (This is not a blocker for the release,
>>> >> though.)
>>> >> >>
>>> >> >> best,
>>> >> >> Colin
>>> >> >>
>>> >> >> On Mon, Oct 14, 2024, at 02:52, Jakub Scholz wrote:
>>> >> >> > Hi Colin,
>>> >> >> >
>>> >> >> > Thanks for the RC. I did some testing of it and run into
>>> >> >> > https://issues.apache.org/jira/browse/KAFKA-17788 which seems to
>>> be a
>>> >> >> > regression in the migration to KRaft process.
>>> >> >> >
>>> >> >> > Can someone who understands this part of the codebase look into it
>>> >> >> please?
>>> >> >> >
>>> >> >> > Thanks & Regards
>>> >> >> > Jakub
>>> >> >> >
>>> >> >> > On Thu, Oct 10, 2024 at 11:16 PM Colin McCabe <cmcc...@apache.org>
>>> >> >> wrote:
>>> >> >> >
>>> >> >> >> This is the second candidate for the release of Apache Kafka
>>> 3.9.0. I
>>> >> >> have
>>> >> >> >> titled it rc2 since I had an rc1 which got very far, even to the
>>> >> point
>>> >> >> of
>>> >> >> >> pushing tags and docker images, before I spotted an issue. So
>>> rather
>>> >> >> than
>>> >> >> >> mutate the tags, I decided to skip over rc1.
>>> >> >> >>
>>> >> >> >> - This is a major release, the final one in the 3.x line. (There
>>> may
>>> >> of
>>> >> >> >> course be other minor releases in this line, such as 3.9.1.)
>>> >> >> >> - Tiered storage will be considered production-ready in this
>>> release.
>>> >> >> >> - This will be the final major release to feature the deprecated
>>> >> >> ZooKeeper
>>> >> >> >> mode.
>>> >> >> >>
>>> >> >> >> This release includes the following KIPs:
>>> >> >> >> - KIP-853: Support dynamically changing KRaft controller
>>> membership
>>> >> >> >> - KIP-1057: Add remote log metadata flag to the dump log tool
>>> >> >> >> - KIP-1049: Add config log.summary.interval.ms to Kafka Streams
>>> >> >> >> - KIP-1040: Improve handling of nullable values in InsertField,
>>> >> >> >> ExtractField, and other transformations
>>> >> >> >> - KIP-1031: Control offset translation in MirrorSourceConnector
>>> >> >> >> - KIP-1033: Add Kafka Streams exception handler for exceptions
>>> >> occurring
>>> >> >> >> during processing
>>> >> >> >> - KIP-1017: Health check endpoint for Kafka Connect
>>> >> >> >> - KIP-1025: Optionally URL-encode clientID and clientSecret in
>>> >> >> >> authorization header
>>> >> >> >> - KIP-1005: Expose EarliestLocalOffset and TieredOffset
>>> >> >> >> - KIP-950: Tiered Storage Disablement
>>> >> >> >> - KIP-956: Tiered Storage Quotas
>>> >> >> >>
>>> >> >> >> Release notes for the 3.9.0 release:
>>> >> >> >>
>>> >> >>
>>> >>
>>> https://dist.apache.org/repos/dist/dev/kafka/3.9.0-rc2/RELEASE_NOTES.html
>>> >> >> >>
>>> >> >> >> *** Please download, test and vote by October 16, 2024.
>>> >> >> >>
>>> >> >> >> Kafka's KEYS file containing PGP keys we use to sign the release:
>>> >> >> >> https://kafka.apache.org/KEYS
>>> >> >> >>
>>> >> >> >> * Release artifacts to be voted upon (source and binary):
>>> >> >> >> https://dist.apache.org/repos/dist/dev/kafka/3.9.0-rc2/
>>> >> >> >>
>>> >> >> >> * Docker release artifacts to be voted upon:
>>> >> >> >> apache/kafka:3.9.0-rc2
>>> >> >> >> apache/kafka-native:3.9.0-rc2
>>> >> >> >>
>>> >> >> >> * Maven artifacts to be voted upon:
>>> >> >> >>
>>> >> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>>> >> >> >>
>>> >> >> >> * Javadoc:
>>> >> >> >> https://dist.apache.org/repos/dist/dev/kafka/3.9.0-rc2/javadoc/
>>> >> >> >>
>>> >> >> >> * Documentation:
>>> >> >> >> https://kafka.apache.org/39/documentation.html
>>> >> >> >>
>>> >> >> >> * Protocol:
>>> >> >> >> https://kafka.apache.org/39/protocol.html
>>> >> >> >>
>>> >> >> >> * Tag to be voted upon (off 3.9 branch) is the 3.9.0-rc2 tag:
>>> >> >> >> https://github.com/apache/kafka/releases/tag/3.9.0-rc2
>>> >> >> >>
>>> >> >> >> * Successful Docker Image Github Actions Pipeline for 3.9 branch:
>>> >> >> >> Docker Build Test Pipeline (JVM):
>>> >> >> >> https://github.com/apache/kafka/actions/runs/11281563007
>>> >> >> >> Docker Build Test Pipeline (Native):
>>> >> >> >> https://github.com/apache/kafka/actions/runs/11281608809
>>> >> >> >>
>>> >> >> >> Thanks to everyone who helped with this release candidate, either
>>> by
>>> >> >> >> contributing code, testing, or documentation.
>>> >> >> >>
>>> >> >> >> Regards,
>>> >> >> >> Colin
>>> >> >> >>
>>> >> >>
>>> >>
>>>

Reply via email to