Thank you, Tobias. If the Jira ticket is correct, that explains a bunch. This was not something we ever noticed in lower environments, but we did only recently upgrade to 3.7 about a month ago. We've never had problems with upgrades before, and this one seemed to go smoothly as well until this happened.
I've never downgraded before. How hard is it? Can I simply replace the app with the older version, or are there additional steps that need to be done as well? Thanks again for your help. On Tue Jun 4, 2024, 12:46 PM GMT, tobias.b...@peripetie.de <mailto:tobias.b...@peripetie.de> wrote: > Had the same Issue some Weeks ago, we also did an upgrade to 3.7.. > Did not receive any hints on this Topic back in the days. > > There is an open StackOverflow > https://stackoverflow.com/questions/78334479/unwritablemetadataexception-on-startup-in-apache-kafka > There is also an open Kafka Jira Ticket > https://issues.apache.org/jira/browse/KAFKA-16662 > We fixed it by downgrading to 3.6. > > Hope this helps. > > -----Ursprüngliche Nachricht----- > Von: Sejal Patel <se...@playerzero.ai.INVALID> > Gesendet: Montag, 3. Juni 2024 08:26 > An: users@kafka.apache.org > Betreff: How do I resolve an UnwritableMetadataException > > I was expanding my kafka cluster to 24 nodes (from 16 nodes) and rebalancing > the topics. 1 of the partitions of the topic did not get rebalanced as > expected (it was taking a million years so I decided to look and see what was > happening). It turns out that the script for mounting the 2nd partition for > use as /data did not kick in and thus there simply wasn't enough disk space > available at the time of the rebalance. The system was left with like 5Mb of > disk space and the kafka brokers were essentially borked at that point. > > So I had to kill the kafka processes, move the original kafka data folder to > a /tmp location, mounted the data partition, and migrated the /tmp kafka > folder back to the original spot. But when I went to startup the kafka > instance I got this message over and over again every few milliseconds. > > [2024-06-03 06:14:01,503] ERROR Encountered metadata loading fault: Unhandled > error initializing new publishers > (org.apache.kafka.server.fault.LoggingFaultHandler) > org.apache.kafka.image.writer.UnwritableMetadataException: Metadata has been > lost because the following could not be represented in metadata version > 3.4-IV0: the directory assign ment state of one or more replicas at > org.apache.kafka.image.writer.ImageWriterOptions.handleLoss(ImageWriterOptions.java:94) > at > org.apache.kafka.metadata.PartitionRegistration.toRecord(PartitionRegistration.java:391) > at org.apache.kafka.image.TopicImage.write(TopicImage.java:71) at > org.apache.kafka.image.TopicsImage.write(TopicsImage.java:84) at > org.apache.kafka.image.MetadataImage.write(MetadataImage.java:155) at > org.apache.kafka.image.loader.MetadataLoader.initializeNewPublishers(MetadataLoader.java:295) > at > org.apache.kafka.image.loader.MetadataLoader.lambda$scheduleInitializeNewPublishers$0(MetadataLoader.java:266) > at > org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127) > at > org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210) > at > org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181) > at java.base/java.lang.Thread.run(Thread.java:1583) [2024-06-03 > 06:14:01,556] INFO [BrokerLifecycleManager id=28] The broker is in RECOVERY. > (kafka.server.BrokerLifecycleManager) > Scarier is that if any node that is working gets restarted, they too start > sending off that message as well. > > I am using a kraft setup and have within the past month upgraded to kafka 3.7 > (original setup over a year ago was kafka 3.4). How do I resolve this issue? > I'm not sure what the problem is or how to fix it. > > If I restart a kraft server, it dies with the same error message and can > never get spun up again. > > Is it possible to recover from this or do I need to start from scratch? If I > start from scratch, how do I keep the topics? What is the best way to proceed > from here? I'm unable to find anything related to this problem via a google > search. > > I'm at a loss and would appreciate any help you can provide. > > Thank you. >