Hi all,

Just in case someone else runs into this problem, we wanted to give an update 
on this as we've solved most of it.

Long story short, when the neutorn's get_security_groups API is hit with an 
admin context, it attempts to get all security groups. Since we have so many 
security groups, this effectively causes neutron-server to hang. We did three 
things to mitigate or fix this:


  1.  If neutron.db.securitygroups_db.SecurityGroupDbMixin#get_security_groups 
is called in a way that we know will cause it to hang, we fail fast and return 
an error. This will allow "normal" calls to that method to complete without 
issue. The obvious downside to this is that the caller will get an error, but 
the caller would have gotten a time out previously, so this isn't any worse and 
neutron-server won't hang. We don't intend to upstream this as it is a bit of a 
hack.
  2.  In 
neutron.db.securitygroups_db.SecurityGroupDbMixin#_get_security_groups_on_port 
(which is called when creating a port), we ensured that get_security_groups is 
getting called with a proper tenant_id filter. It wasn't before and because 
this gets called with an admin context from nova-scheduler, it would attempt to 
get all security groups, which it doesn't need.
  3.  We found that this commit (which isn't in a maintenance release yet) 
fixed one of the problem areas:
     *   
https://github.com/openstack/nova/commit/19fdaa225abd007a13cd38c742e27c5ee620186c
     *   https://review.openstack.org/#/c/30048/
     *   We cherry picked that and we're now applying it as a patch via Anvil. 
It's already been back ported to stable/havana, so once it get's into a 
maintenance release, we'll be able to remove the patch.

We think #2 still exists as an upstream bug in master. Will investigate further 
and submit a bug and patch if someone else hasn't already addressed it.

/Craig J

From: Mike Dorman <[email protected]<mailto:[email protected]>>
Date: Wednesday, February 5, 2014 5:36 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: [Openstack] [neutron] neutron-server iterating over all security 
groups, not just those in the project

We're seeing an issue where neutron-server (Havana) iterates over all security 
groups (with an individual SELECT query for each), rather than just the 
security groups in the tenant.  We can trigger this by creating a port using 
the default security group.  If we specify no security groups, or a specific 
security group, it works fine.

We have ~1000 tenants and 10 security groups in each tenant in this 
environment.  So this ultimately results in 10k SQL queries, which tanks 
neutron-server for a few minutes.  Note that all the tenants are in the same 
network.

Still trying to run down where in the code this is happening.  But I've been 
able to trace the SQL queries up to when it starts the iteration:  
http://pastebin.com/ZkP5idkJ

You can see where the first two queries get the groups/rules just for the 
specific tenant.  But then after that, it's the same queries, but for 
groups/rules in all tenants.

We will continue looking into it to see what we can find, but any suggestions 
or ideas would be appreciated.

Thanks,
Mike

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : [email protected]
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Reply via email to