Hi All, recently upgraded my 1 controller 60 compute node Ubuntu 12.04(+cloud archive) system from Grizlzy to Havana. Now even before I let my users back to the API I'm barely able to do anything due to authentication time outs. I am using neutron which like to authenticate *a lot*, I'm not entierly convinced the real problem isn't neutron reauthenticating a bajillion services a second, but right now it looks like Keystone...
I'm using UUID Tokens with memcached backend. Under Grizzly I had been using Peter Feiner's multi worker patches, though this was only needed for peak loads (when starting 100's of instances). With just background noise of the running compute-nodes (60) and instances (281) in default single worker (eventlet) mode keystone runs at 100% and many client requests (from dash board or CLI) time out. Nova-compute nodes also frequently log timeouts when trying to authenticate to the neutron service. Verbose keystone logs show very little activity even when running 100% load, just a few "INFO acceess" type entries every few *minutes*. Debug logging show many sqlalchemy actions per second for the neutron user id and admin tenant id. I took the next obvious step and put keystone behind apache, while that does get more servers running, performance if anything is even worse while using virtually all of the 12 core controller node's CPUs rather than just one of them. The logs quickly fill with data read timeouts: 2014-01-11 12:31:26.606 3054 INFO access [-] 192.168.128.43 - - [11/Jan/2014:17:31:26 +0000] "POST http://192.168.128.15:35357/v2.0/tokens HTTP/1.1" 500 167 2014-01-11 12:31:26.621 3054 ERROR keystone.common.wsgi [-] request data read error 2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi Traceback (most recent call last): 2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi File "/usr/lib/python2.7/dist-packages/keystone/common/wsgi.py", line 371, in __call__ 2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi response = self.process_request(request) 2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi File "/usr/lib/python2.7/dist-packages/keystone/middleware/core.py", line 110, in process_request 2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi params_json = request.body 2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi File "/usr/lib/python2.7/dist-packages/webob/request.py", line 677, in _body__get 2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi self.make_body_seekable() # we need this to have content_length 2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi File "/usr/lib/python2.7/dist-packages/webob/request.py", line 922, in make_body_seekable 2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi self.copy_body() 2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi File "/usr/lib/python2.7/dist-packages/webob/request.py", line 945, in copy_body 2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi self.body = self.body_file.read(self.content_length) 2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi File "/usr/lib/python2.7/dist-packages/webob/request.py", line 1521, in readinto 2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi data = self.file.read(sz0) 2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi IOError: request data read error 2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi I've played around with different process and tread count numbers in the WSGIDaemonProcess apache directive, but while some seem make it worse none have made it better. Clearly I must be doing something wrong since the single process eventlet mode has significantly better performance than the multiprocess wsgi mode. I've also fiddled a bit with the dogpile cache settings, when running a single stand alone process the 'memory' backend seemed to make things actually go, though after getting the pylibmc backend setup (or I think setup there could well be more I'm misisng), which didn't make a noticible difference, I wasn't able to revert to the 'success' or the memory backend though for obvious reasons I wouldn't have wanted to keep that one in production anyway. How can I either make Keystone go faster or Neutron authenticate less? -Jon _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack