Hi Jon, I have published a patch set that I hope will help to address this issue: https://review.openstack.org/#/c/66149/ . If you need this in another format, please let me know.
The only caveat is that you should expect the maximum number of tokens per-user to drop by as much as 50% due to storing extra data about the token (avoiding the need to ask memcache about each token on issuance of a new one). I still recommend keeping the expires time down to ~3600 seconds or so. I also recommend doing a flush (like you did before) when deploying this change. Let me know if this helps (or if you have any issues with it). Feel free to respond via email or comment on the review. Disclaimer: I have not performed functional performance tests on this code, just some initial cleanup and change of logic that should help minimize external calls. Cheers, Morgan — Morgan Fainberg Principal Software Engineer Core Developer, Keystone m...@metacloud.com On January 11, 2014 at 12:50:50, Jonathan Proulx (j...@jonproulx.com) wrote: Note about 30min after flushing memcache things started gumming up again (maybe 15min long tokens instead of 60min?). Details in this response reflect the state of my world at about that point. Recent flush with things just starting to grind to a halt. On Sat, Jan 11, 2014 at 2:46 PM, Jay Pipes <jaypi...@gmail.com> wrote: > Hi Jonathan! I have not yet deployed Havana to production, however I'll > add some comments and suggestions below that you may want to try in > order to isolate root cause... > > On Sat, 2014-01-11 at 13:34 -0500, Jonathan Proulx wrote: >> Hi All, >> >> recently upgraded my 1 controller 60 compute node Ubuntu 12.04(+cloud >> archive) system from Grizlzy to Havana. Now even before I let my >> users back to the API I'm barely able to do anything due to >> authentication time outs. I am using neutron which like to >> authenticate *a lot*, I'm not entierly convinced the real problem >> isn't neutron reauthenticating a bajillion services a second, but >> right now it looks like Keystone... >> >> I'm using UUID Tokens with memcached backend. > > Two requests: > > 1) Could you paste your keystone.conf pls? I'm curious, particularly, > what backend you are using for the catalog driver and the identity > driver. Incomplete but I think all the relevant bits, I can sanitize and past up the whole thing if you really think that would be helpful. [sql] connection = mysql://<keystone_db_user>:<keystone_db_password>@192.168.128.15/keystone idle_timeout = 200 [memcache] servers = localhost:11211 [identity] driver = keystone.identity.backends.sql.Identity [trust] driver = keystone.trust.backends.sql.Trust [catalog] driver = keystone.catalog.backends.sql.Catalog [token] driver = keystone.token.backends.memcache.Token expiration = 3600 provider=keystone.token.providers.uuid.Provider caching = False [cache] enabled = True backend = dogpile.cache.pylibmc backend_argument = url:127.0.0.1 [policy] driver = keystone.policy.backends.sql.Policy [ec2] driver = keystone.contrib.ec2.backends.sql.Ec2 > 2) How are you configuring memcache? Please let us know if you've given > memcache a decent amount of memory to support the token cache. I > believe, IIRC, our memcache-per-controller was 2GB, but we may have > increased that to 4GB... can't remember. In any case, it should be a > decent amount of memory if you have lots of token validation and/or > revocation calls happening. # Use <num> MB memory max to use for object storage. -m 45876 # Limit the number of simultaneous incoming connections. -c 8192 # Number of threads to use to process incoming requests. -t 24 > Do a little bit of memcache performance investigation for us, as well. > It will tell us if memcache is having trouble keeping up with constant > cache flushing... > > If you telnet to the memcached instance: > > telnet <HOST> 11211 > > and then type "stats" a few times in a minute. Keep an eye on the number > returned for "evictions". If that number increases substantially over > the time you do this mini-profile, it's an indication that memcached is > having a tough time keeping up with the rate of cache insertions, and is > looking for least-used blocks to evict in order to find room to put new > blocks in. This can sometimes be an indication you need to increase the > amount of memory for memcached. Note from previous email I recently flushed the cache, but for now evictions are holding steady at 0. For other maybe relevant memcache stats: STAT uptime 2173 STAT time 1389471465 STAT version 1.4.13 STAT libevent 2.0.16-stable STAT pointer_size 64 STAT rusage_user 48.771048 STAT rusage_system 278.953433 STAT curr_connections 51 STAT total_connections 2274 STAT connection_structures 769 STAT reserved_fds 120 STAT cmd_get 4719079 STAT cmd_set 15122 STAT cmd_flush 1 STAT cmd_touch 0 STAT get_hits 4716287 STAT get_misses 2792 ... STAT bytes_read 376435142 STAT bytes_written 14583390698 STAT limit_maxbytes 48104472576 ... STAT bytes 4886925 STAT curr_items 1572 STAT total_items 2394 STAT evictions 0 STAT reclaimed 0 > OK, so clearly Keystone is configured to use the sql driver for at least > *something*... likely the identity and/or catalog driver. > > If you are using the sql driver for the catalog, I would highly advise > switching to the templated catalog driver. It's essentially a static > file that is cached by the driver and used as a string interpolation > template for the service catalog (used on EVERY single request for a new > token, since the token includes the service catalog attached to the > domain/project/user). Yes, as you can see both identity and catalog are using sql back end. I'll take a look at the .templated.TemplatedCatalog, which may help some, but I don't think it's the whole story (though every bit helps). >> I took the next obvious step and put keystone behind apache, while >> that does get more servers running, performance if anything is even >> worse while using virtually all of the 12 core controller node's CPUs >> rather than just one of them. >> >> The logs quickly fill with data read timeouts: <snip> > Hmm. The traceback above really doesn't seem to indict either the token > nor the catalog backend -- as the IOError is indicating that there is an > issue in *reading the incoming WSGI request body* :( > > That request body is being constructed clearly by one of the > python-xxxclients -- probably python-neutronclient -- so perhaps it is > worth doing some python-neutronclient CLI calls using --debug to see if > you can get an idea on why on Earth webob.Request is having problems > constructing a request body from the client call. Very odd indeed. I missed a very relevant bit in my log fragment: Script timed out before returning headers: admin This is the root cause of the webob trace back, it is the neutron client as called form the compute nodes the auth failure is loged in nova-compute.log as: ERROR nova.network.neutronv2 [-] Neutron client authentication failed: Connection to neutron failed: timed out I did manage to catch a CLI debug from a similar failure, long hang followed by: jon@x1:~$ neutron --debug net-list Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/neutronclient/shell.py", line 517, in run_subcommand return run_command(cmd, cmd_parser, sub_argv) File "/usr/lib/python2.7/dist-packages/neutronclient/shell.py", line 78, in run_command return cmd.run(known_args) File "/usr/lib/python2.7/dist-packages/neutronclient/common/command.py", line 35, in run return super(OpenStackCommand, self).run(parsed_args) File "/usr/lib/python2.7/dist-packages/cliff/display.py", line 84, in run column_names, data = self.take_action(parsed_args) File "/usr/lib/python2.7/dist-packages/neutronclient/common/command.py", line 41, in take_action return self.get_data(parsed_args) File "/usr/lib/python2.7/dist-packages/neutronclient/neutron/v2_0/__init__.py", line 586, in get_data data = self.retrieve_list(parsed_args) File "/usr/lib/python2.7/dist-packages/neutronclient/neutron/v2_0/__init__.py", line 532, in retrieve_list neutron_client = self.get_client() File "/usr/lib/python2.7/dist-packages/neutronclient/neutron/v2_0/__init__.py", line 330, in get_client return self.app.client_manager.neutron File "/usr/lib/python2.7/dist-packages/neutronclient/common/clientmanager.py", line 41, in __get__ self._handle = self.factory(instance) File "/usr/lib/python2.7/dist-packages/neutronclient/neutron/client.py", line 36, in make_client instance.initialize() File "/usr/lib/python2.7/dist-packages/neutronclient/common/clientmanager.py", line 92, in initialize httpclient.authenticate() File "/usr/lib/python2.7/dist-packages/neutronclient/client.py", line 237, in authenticate raise exceptions.Unauthorized(message=body) Unauthorized: Thanks, -Jon _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
_______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack