pyttel commented on PR #10: URL: https://github.com/apache/ozone-helm-charts/pull/10#issuecomment-2512294329
Hello again, I've conducted numerous tests and discovered quite a few insights ^^. It turned out to be more challenging than I initially expected. I've successfully set up Ozone Manager HA with proper leadership transfer, decommissioning, and bootstrap detection. Over the next few days, I'll write a detailed description and push the code. I used some Helm hooks and jobs for this setup. It took some time to configure everything correctly. Currently, I'm focusing on the Storage Container Manager (SCM). The existing solution, which utilizes two init containers, is not very effective because if more than one pod is deleted, the cluster doesn't start up properly due to DNS resolution, pod deployment order, readiness probes, etc. I intend to replicate the approach I used for the Ozone Manager. Currently, I'm facing challenges with the leadership transfer. ` Target test-scm-0 not found in group [cc965017-abb3-4693-a016-fa8fe34be6dc, 1bbfa1c7-69f3-42bd-845e-bf8dc0fc7fe1, b64a194c-2921-4fe3-a92c-08f9a187894e, 78e0f9e0-9f3b-429b-9bbe-db8666ada092, b9e47090-175c-4e32-952b-2df989364483, 7e3b1a64-052c-4b5f-8be2-648226d765c4].` So it seems to be an issue with the admin transfer for SCM. This works for OM fine. The it seems to be an missmatch between UUID and node id. The admin transfer seems to look for peernodes with `node id` not `uuid`. Is this known or can somebody help me with this? Maybe it is a bug? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
