Public bug reported: This was observed during tests on environment with several controllers: when a routers with gateways and subnets are created at a high rate, sometimes port creation for router gateway may fail with DBDeadlock. In several cases that I investigated I found that deadlock happens when router port is created in parallel with dhcp port(s) creation on other servers. Generally we have simultaneous port creation. Port creation involves locking 'ports' and 'binding' tables: get_locked_port_and_binding() ml2 db method, which essentially does: port = (session.query(models_v2.Port). enable_eagerloads(False). filter_by(id=port_id). with_lockmode('update'). one()) binding = (session.query(models.PortBinding). enable_eagerloads(False). filter_by(port_id=port_id). with_lockmode('update'). one())
Also there are locks during ip allocation for the port. I'm not sure how exacly this may lead to deadlock. It may probably happen due to specifics of Galera working in active-active mode: throwing deadlock errors when it fails to validate a change with other members of the cluster. Examples of tracebacks: http://paste.openstack.org/show/399624/ http://paste.openstack.org/show/405057/ ** Affects: neutron Importance: Undecided Assignee: Oleg Bondarev (obondarev) Status: New ** Tags: db ml2 -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1479738 Title: DB deadlocks on simultaneous port creation Status in neutron: New Bug description: This was observed during tests on environment with several controllers: when a routers with gateways and subnets are created at a high rate, sometimes port creation for router gateway may fail with DBDeadlock. In several cases that I investigated I found that deadlock happens when router port is created in parallel with dhcp port(s) creation on other servers. Generally we have simultaneous port creation. Port creation involves locking 'ports' and 'binding' tables: get_locked_port_and_binding() ml2 db method, which essentially does: port = (session.query(models_v2.Port). enable_eagerloads(False). filter_by(id=port_id). with_lockmode('update'). one()) binding = (session.query(models.PortBinding). enable_eagerloads(False). filter_by(port_id=port_id). with_lockmode('update'). one()) Also there are locks during ip allocation for the port. I'm not sure how exacly this may lead to deadlock. It may probably happen due to specifics of Galera working in active-active mode: throwing deadlock errors when it fails to validate a change with other members of the cluster. Examples of tracebacks: http://paste.openstack.org/show/399624/ http://paste.openstack.org/show/405057/ To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1479738/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp