Mike Bayer <mba...@redhat.com> wrote:
> > > Doug Hellmann <d...@doughellmann.com> wrote: > >>> My question is then how is it that such an architecture would be >>> possible, >>> that Cinder’s service starts up without greenlets yet allows >>> greenlet-based >>> requests to come in before this critical task is complete? Shouldn’t the >>> various oslo systems be providing patterns to prevent this disastrous >>> combination? >> >> I would have thought so, but they are (mostly) libraries not frameworks >> so they are often combined in unexpected ways. Let's see where the issue >> is before deciding on where the fix should be. > > > my next suspicion is that this is actually a spawned subprocess, though > oslo.concurrency, and it is failing to create a new connection pool and thus > is sharing the same file descriptor between processes. That is a much more > ordinary issue and would explain everything I’m seeing in a more familiar > way. Let me see if i can confirm *that*. So, that was it. Here’s some debugging I added to oslo.db inside of its own create_engine() that catches this easily: engine = sqlalchemy.create_engine(url, **engine_args) import os from sqlalchemy import event @event.listens_for(engine, "connect") def connect(dbapi_connection, connection_record): connection_record.info['pid'] = os.getpid() @event.listens_for(engine, "checkout") def checkout(dbapi_connection, connection_record, connection_proxy): pid = os.getpid() if connection_record.info['pid'] != pid: raise Exception( "Connection record belongs to pid %s, " "attempting to check out in pid %s" % (connection_record.info['pid'], pid)) sure enough, the database errors go away and Cinders logs when I do a plain startup are filled with: ERROR cinder.openstack.common.threadgroup [-] Connection record belongs to pid 21200, attempting to check out in pid 21408 ERROR cinder.openstack.common.threadgroup [-] Connection record belongs to pid 21200, attempting to check out in pid 21409 etc. The subprocesses here are logged as: 2015-02-17 13:05:12.583 DEBUG oslo_concurrency.processutils [req-a06464d1-4785-4a29-8e58-239d5a674451 None None] CMD "sudo cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C LVM_SYSTEM_DIR=/etc/cinder vgs --noheadings --unit=g -o name,size,free,lv_count,uuid --separator : --nosuffix stack-volumes-lvm-1" returned: 0 in 0.828s from (pid=21408) execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:221 a trace is as follows: Traceback (most recent call last): File "/opt/stack/cinder/cinder/openstack/common/threadgroup.py", line 143, in wait x.wait() File "/opt/stack/cinder/cinder/openstack/common/threadgroup.py", line 47, in wait return self.thread.wait() File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 175, in wait return self._exit_event.wait() File "/usr/lib/python2.7/site-packages/eventlet/event.py", line 121, in wait return hubs.get_hub().switch() File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch return self.greenlet.switch() File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main result = function(*args, **kwargs) File "/opt/stack/cinder/cinder/openstack/common/service.py", line 492, in run_service service.start() File "/opt/stack/cinder/cinder/service.py", line 142, in start self.manager.init_host() File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 105, in wrapper return f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 105, in wrapper return f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 105, in wrapper return f(*args, **kwargs) File "/opt/stack/cinder/cinder/volume/manager.py", line 294, in init_host volumes = self.db.volume_get_all_by_host(ctxt, self.host) File "/opt/stack/cinder/cinder/db/api.py", line 191, in volume_get_all_by_host return IMPL.volume_get_all_by_host(context, host) File "/opt/stack/cinder/cinder/db/sqlalchemy/api.py", line 160, in wrapper return f(*args, **kwargs) File "/opt/stack/cinder/cinder/db/sqlalchemy/api.py", line 1229, in volume_get_all_by_host result = _volume_get_query(context).filter(or_(*conditions)).all() File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2320, in all return list(self) File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2438, in __iter__ return self._execute_and_instances(context) File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2451, in _execute_and_instances close_with_result=True) File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2442, in _connection_from_session **kw) File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 854, in connection close_with_result=close_with_result) File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 860, in _connection_for_bind return engine.contextual_connect(**kwargs) File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1800, in contextual_connect self.pool.connect(), File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 338, in connect return _ConnectionFairy._checkout(self) File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 671, in _checkout fairy) File "/usr/lib64/python2.7/site-packages/sqlalchemy/event/attr.py", line 260, in __call__ fn(*args, **kw) File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/session.py", line 408, in checkout "attempting to check out in pid %s" % (connection_record.info['pid'], pid)) 2015-02-17 13:05:14.626 TRACE cinder.openstack.common.threadgroup Exception: Connection record belongs to pid 21200, attempting to check out in pid 21558 > > > >> Doug >> >>>> Doug >>>> >>>>> Current status is that I’m continuing to try to determine why this is >>>>> happening here, and seemingly nowhere else. >>>>> >>>>> >>>>> >>>>> >>>>> __________________________________________________________________________ >>>>> OpenStack Development Mailing List (not for usage questions) >>>>> Unsubscribe: >>>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>> >>>> __________________________________________________________________________ >>>> OpenStack Development Mailing List (not for usage questions) >>>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >>> __________________________________________________________________________ >>> OpenStack Development Mailing List (not for usage questions) >>> Unsubscribe: >>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev