im not going to mark this as invliad but introducing a distribute lock manger to nova i think would require a spec.
its a very heavy weight solution to enabling a topology we do not officially support today. today we require that if the period is enable then its only enabled in one scheduled isntace precisely to mitigate the problem described here. that does not mean we cannot improve the current situation or that we cant dicussiotn this but it would be a feature not a bug as this is an existing, know limitation of the perodic and therefore not a bug. alternitves are - externally scheduling the hostmapping (via corn or k8s job) - using https://en.wikipedia.org/wiki/Rendezvous_hashing to distribute the mapping tasks between all scheduled to get eventual consitance. - gracefully hanedingl the db conflict and proceedign with the other mapping move the error/wrarning to debug level. tooz has a low number of maintainer and nova was planning to remove it form our dep list with the removal of the ironic drivers use of its hashring. as a general design goal nova intends the schdler service to be effectively stateless and horizontally scalable, adding any kind of distributed locking limits that scalability and it a non-trivial cost to require a tooz persistence backend just for this. one enhancement that should be made is the config option currently does not carry the guance that it should only be enabled on one schduelr https://docs.openstack.org/nova/latest/configuration/config.html#scheduler.discover_hosts_in_cells_interval while I believe that is discussed elsewhere in the docs if you only look at that then its not obvious that this is not recommended or supported. ** Changed in: nova Status: New => Opinion -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2089386 Title: [RFE] Add Distributed Locking for Host Discovery Operations in Multi- Scheduler Environments Status in OpenStack Compute (nova): Opinion Bug description: Add Distributed Locking for Host Discovery Operations in Multi- Scheduler Environments Host discovery operations in Nova are currently vulnerable to race conditions and concurrent execution issues, particularly in production environments where multiple Nova schedulers are running simultaneously for high availability/redundancy, and each scheduler: - Shares the same database backend - Runs its own periodic automatic host discovery task - Cron jobs run `nova-manage cell_v2 discover_hosts` periodically on the same hosts as the schedulers Current symptoms (due to overlapping host discovery tasks): - Possible frequent host discovery failures, missed or incomplete host discoveries - Error messages about duplicate host mappings - Database conflicts when multiple processes try to map the same hosts simultaneously Proposed Solution: Implement an opt-in distributed locking mechanism for host discovery operations to ensure that CLI and periodic automatic host discovery tasks run sequentially. The solution should: 1. Be opt-in, enabled via config option 2. Use a distributed lock (leveraging tooz.coordination) before initiating any host discovery operation 3. Support coordination across: - Scheduler automatic host discovery task - `nova-manage cell_v2 discover_hosts` command 4. Extend Nova configuration with an additional config option for defining coordinator URI Benefits: - Prevents race conditions during host discovery across all scenarios - Removes the need for external complex scheduling and coordination of discovery jobs in high availability/redundancy setups - Reduces operational overhead by eliminating manual conflict resolution The solution should be configurable and work across different Nova deployments without requiring additional external dependencies beyond what Nova already uses for coordination. This will greatly benefit highly available, large-scale deployments with multiple schedulers and automated host discovery operations. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2089386/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp