While messing around with the release candidate, I found a separate blocker bug: KAFKA-17794.
Basically, our behavior is confusing when we format the storage directory with KIP-853 (dynamic controller quorum), but no initial controllers are supplied. I have a PR which requires the initial controllers to be initialized in this case (since otherwise, it does not work). best, Colin On Mon, Oct 14, 2024, at 16:34, Colin McCabe wrote: > On Mon, Oct 14, 2024, at 15:04, Jakub Scholz wrote: >> Hi Colin, >> >> So, how exactly does this major misconfiguration that was not documented >> for over a year and nobody complained manifest itself? What should I look >> for in the logs? What are the problems it manifests itself through? There >> are plenty of users who went through the migration with this "major >> misconfiguration". So what should they be looking for? Did they lose >> messages? Does it have security consequences that require a CVE? How >> does one recover from the problems it caused? >> >> I do not think changing the controller name is as simple as you suggest. >> And it is especially not simple if you need to roll it out to thousands of >> users. >> >> I also feel that issues like this need to be taken more seriously than just >> shrugging it off and quickly closing JIRA someone else opened as invalid. >> This is not the first thing that was found as undocumented. It indicates >> significant quality issues, at least on the documentation side. How many >> other "major misconfigurations" are still left? > > Hi Jakub, > > I understand your frustration. Let me explain. Kafka wants to divide up > all listeners into either broker listeners or controller listeners. The > sets are disjoint: a listener can't be both. > > BrokerServer will try to open the ports that belong to the broker; > ControllerServer will try to open the ports that belong to the > controller. You obviously can't open the same port twice (in standard > UNIX, at least) or if you somehow did, the result would be nonsense. > This isn't such a big deal on broker or controller nodes, but it > becomes a bigger deal when in "combined" mode, where a process > functions as both broker and controller. > > You are getting the error message "requirement failed: > control.plane.listener.name must be a listener name defined in > advertised.listeners" because the broker is looking for a listener > named CONTROLPLANE-9090, but it already knows that this is NOT an > advertised listener for the broker, it is an advertised listener for > the controller. You would get the same error if you tried to set the > inter.broker.listener or the broker replication listener to the > CONTROLLER listener. > > The reason why this wasn't an issue in 3.8.0, and is in 3.9-RC2 is that > in 3.8.0 and earlier, advertised.listeners was just for broker > listeners. There was never a need to put a controller listener in there > because controller listeners were statically configured by > controller.quorum.voters. In fact, when in KRaft mode, it would be a > fatal configuration error to put a controller listener into > advertised.listeners. Unfortunately, we failed to enforce this when the > broker was in migration mode. > > I guess for the purpose of being bug-compatible with 3.8, we could make > an exception here and force the listener specified by > control.plane.listener to appear in effectiveAdvertisedBrokerListeners. > Since control plane listeners are going away anyway, we won't have to > support this exception for very long. > > By the way, I wasn't trying to be dismissive of your bug report (in > fact I spent several hours on it). I felt (in fact I still feel) that > it's a configuration error. But your compatibility argument is a > reasonable one. So let's be compatible. > > best, > Colin > >> >> Jakub >> >> On Mon, Oct 14, 2024 at 11:33 PM Colin McCabe <cmcc...@apache.org> wrote: >> >>> Hi Jakub, >>> >>> It has always been required to separate control plane listeners and >>> controller listeners. Failing to do this is a major misconfiguration. It >>> may not have been caught sometimes, but that is a bug. >>> >>> It should be simple to fix the configuration you posted -- simply have a >>> different name for the controller listener than the control plane listener. >>> >>> best, >>> Colin >>> >>> On Mon, Oct 14, 2024, at 11:16, Jakub Scholz wrote: >>> > The different name of the controller listener for KRaft controllers and >>> > control plane listener in ZooKeeper-based cluster was not required before >>> > and it is not simple to change to handle now at the "last minute". So >>> given >>> > that this is called production-ready already for some time, I think this >>> is >>> > breaking API change and should be treated as such. >>> > >>> > Thanks & Regards >>> > Jakub >>> > >>> > On Mon, Oct 14, 2024 at 7:55 PM Colin McCabe <cmcc...@apache.org> wrote: >>> > >>> >> Hi Jakub, >>> >> >>> >> After looking through the attached file on the JIRA, I can say that this >>> >> is a misconfiguration. control.plane.listener is a totally separate >>> concept >>> >> from control.plane.listener.name. They should never be set to the same >>> >> value. The controller listener must have a different name and value than >>> >> the control plane listener (if any). >>> >> >>> >> I also tested myself that KRaft migration works with >>> >> control.plane.listener configured. It works on both 3.8 and 3.9-RC2. >>> >> >>> >> My initial statement that control.plane.listener was not supported >>> during >>> >> ZK migration was incorrect. As you said, it is supported during >>> migration >>> >> up to the point that we are in KRaft mode. (Another reason why having >>> the >>> >> control plane listener = controller listener would not make sense.) >>> >> >>> >> Thanks for the bug report and discussion. I've closed this as invalid >>> now >>> >> that I have tested migration using control.plane.listener for myself and >>> >> verified that it works. >>> >> >>> >> best, >>> >> Colin >>> >> >>> >> On Mon, Oct 14, 2024, at 08:31, Jakub Scholz wrote: >>> >> >> control.plane.listener is not (and never has been) supported in KRaft >>> >> > mode. >>> >> > >>> >> > You mean control.plane.listener.name is not supported in KRaft I >>> guess? >>> >> > Well, this is not KRaft, this is migration, so it uses the settings >>> that >>> >> it >>> >> > used before for the Zoo-based cluster and that includes using >>> dedicated >>> >> > control plane listener. I don't think I can "just remove it" because >>> the >>> >> > other nodes will use it during the rolling update. >>> >> > >>> >> > This also worked fine with 3.8 (and 3.7, etc.) -> so if it is not >>> >> supported >>> >> > now, it is a breaking API change I guess which should be a blocker. >>> >> > >>> >> > Thanks & Regards >>> >> > Jakub >>> >> > >>> >> > On Mon, Oct 14, 2024 at 5:12 PM Colin McCabe <cmcc...@apache.org> >>> wrote: >>> >> > >>> >> >> Hi Jakub, >>> >> >> >>> >> >> Thanks for testing. control.plane.listener is not (and never has >>> been) >>> >> >> supported in KRaft mode. You have to remove control.plane.listener >>> >> >> configurations before migrating. I filed KAFKA-17790 to document >>> this in >>> >> >> the migration instructions. (This is not a blocker for the release, >>> >> though.) >>> >> >> >>> >> >> best, >>> >> >> Colin >>> >> >> >>> >> >> On Mon, Oct 14, 2024, at 02:52, Jakub Scholz wrote: >>> >> >> > Hi Colin, >>> >> >> > >>> >> >> > Thanks for the RC. I did some testing of it and run into >>> >> >> > https://issues.apache.org/jira/browse/KAFKA-17788 which seems to >>> be a >>> >> >> > regression in the migration to KRaft process. >>> >> >> > >>> >> >> > Can someone who understands this part of the codebase look into it >>> >> >> please? >>> >> >> > >>> >> >> > Thanks & Regards >>> >> >> > Jakub >>> >> >> > >>> >> >> > On Thu, Oct 10, 2024 at 11:16 PM Colin McCabe <cmcc...@apache.org> >>> >> >> wrote: >>> >> >> > >>> >> >> >> This is the second candidate for the release of Apache Kafka >>> 3.9.0. I >>> >> >> have >>> >> >> >> titled it rc2 since I had an rc1 which got very far, even to the >>> >> point >>> >> >> of >>> >> >> >> pushing tags and docker images, before I spotted an issue. So >>> rather >>> >> >> than >>> >> >> >> mutate the tags, I decided to skip over rc1. >>> >> >> >> >>> >> >> >> - This is a major release, the final one in the 3.x line. (There >>> may >>> >> of >>> >> >> >> course be other minor releases in this line, such as 3.9.1.) >>> >> >> >> - Tiered storage will be considered production-ready in this >>> release. >>> >> >> >> - This will be the final major release to feature the deprecated >>> >> >> ZooKeeper >>> >> >> >> mode. >>> >> >> >> >>> >> >> >> This release includes the following KIPs: >>> >> >> >> - KIP-853: Support dynamically changing KRaft controller >>> membership >>> >> >> >> - KIP-1057: Add remote log metadata flag to the dump log tool >>> >> >> >> - KIP-1049: Add config log.summary.interval.ms to Kafka Streams >>> >> >> >> - KIP-1040: Improve handling of nullable values in InsertField, >>> >> >> >> ExtractField, and other transformations >>> >> >> >> - KIP-1031: Control offset translation in MirrorSourceConnector >>> >> >> >> - KIP-1033: Add Kafka Streams exception handler for exceptions >>> >> occurring >>> >> >> >> during processing >>> >> >> >> - KIP-1017: Health check endpoint for Kafka Connect >>> >> >> >> - KIP-1025: Optionally URL-encode clientID and clientSecret in >>> >> >> >> authorization header >>> >> >> >> - KIP-1005: Expose EarliestLocalOffset and TieredOffset >>> >> >> >> - KIP-950: Tiered Storage Disablement >>> >> >> >> - KIP-956: Tiered Storage Quotas >>> >> >> >> >>> >> >> >> Release notes for the 3.9.0 release: >>> >> >> >> >>> >> >> >>> >> >>> https://dist.apache.org/repos/dist/dev/kafka/3.9.0-rc2/RELEASE_NOTES.html >>> >> >> >> >>> >> >> >> *** Please download, test and vote by October 16, 2024. >>> >> >> >> >>> >> >> >> Kafka's KEYS file containing PGP keys we use to sign the release: >>> >> >> >> https://kafka.apache.org/KEYS >>> >> >> >> >>> >> >> >> * Release artifacts to be voted upon (source and binary): >>> >> >> >> https://dist.apache.org/repos/dist/dev/kafka/3.9.0-rc2/ >>> >> >> >> >>> >> >> >> * Docker release artifacts to be voted upon: >>> >> >> >> apache/kafka:3.9.0-rc2 >>> >> >> >> apache/kafka-native:3.9.0-rc2 >>> >> >> >> >>> >> >> >> * Maven artifacts to be voted upon: >>> >> >> >> >>> >> https://repository.apache.org/content/groups/staging/org/apache/kafka/ >>> >> >> >> >>> >> >> >> * Javadoc: >>> >> >> >> https://dist.apache.org/repos/dist/dev/kafka/3.9.0-rc2/javadoc/ >>> >> >> >> >>> >> >> >> * Documentation: >>> >> >> >> https://kafka.apache.org/39/documentation.html >>> >> >> >> >>> >> >> >> * Protocol: >>> >> >> >> https://kafka.apache.org/39/protocol.html >>> >> >> >> >>> >> >> >> * Tag to be voted upon (off 3.9 branch) is the 3.9.0-rc2 tag: >>> >> >> >> https://github.com/apache/kafka/releases/tag/3.9.0-rc2 >>> >> >> >> >>> >> >> >> * Successful Docker Image Github Actions Pipeline for 3.9 branch: >>> >> >> >> Docker Build Test Pipeline (JVM): >>> >> >> >> https://github.com/apache/kafka/actions/runs/11281563007 >>> >> >> >> Docker Build Test Pipeline (Native): >>> >> >> >> https://github.com/apache/kafka/actions/runs/11281608809 >>> >> >> >> >>> >> >> >> Thanks to everyone who helped with this release candidate, either >>> by >>> >> >> >> contributing code, testing, or documentation. >>> >> >> >> >>> >> >> >> Regards, >>> >> >> >> Colin >>> >> >> >> >>> >> >> >>> >> >>>