[ 
https://issues.apache.org/jira/browse/KAFKA-12901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17358350#comment-17358350
 ] 

Suriya Vijayaraghavan commented on KAFKA-12901:
-----------------------------------------------

I did have doubts regarding the[ 
KAFKA-12677|https://issues.apache.org/jira/browse/KAFKA-12677] issue. But not 
sure if I should focus on why the server java process did not stop, or why the 
metadata did not get updated after broker restart. 

[Logs|https://docs.google.com/document/d/1K8mdN4R59oR6SkI5d4FMAZlUNU2hmLidm1rxnTVtvJw/edit?usp=sharing]
 during the ShutDown due to Zookeeper session expiration.

[Logs|https://docs.google.com/document/d/1cUS4rwMI0CLx02lvdzM_kmVfH209wctHjsxCApZur-8/edit?usp=sharing]
 after restart

> Metadata not updated after broker restart.
> ------------------------------------------
>
>                 Key: KAFKA-12901
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12901
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 2.8.0
>            Reporter: Suriya Vijayaraghavan
>            Priority: Major
>
> We upgraded to version 2.8 from 2.7. After monitoring for few weeks we 
> upgraded in our production setup (as we didn't enable Kraft we went ahead), 
> we faced TimeoutException in our clients after few weeks in our production 
> setup. We tried to list all active brokers using admin client API, all 
> brokers were listed properly. So we logged into that broker and tried to do a 
> describe topic with localhost as bootstrap-server, but we got timeout as 
> there.
> When checking the logs, we noticed a Shutdown print from kafka-shutdown-hook
> thread (zookeeper session timed out and we had three retry failures). But the 
> controlled shutdown got failed (got unknown server error response from the 
> controller), and proceeded to unclean shutdown. Still the process didn't get 
> quit but the process didnt process any other operation as well.  And this did 
> not remove the broker from alive status for hours (able to see this broker in 
> list of brokers) and our clients were still trying to contact this broker and 
> failing with timeout exception. So we tried restarting the problematic 
> broker, but we faced unknown topic or partition issue in our client after the 
> restart which caused timeout as well. We noticed that metadata was not 
> loaded. So we had to restart our controller. And after restarting the 
> controller everthing got back to normal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to