Excerpts from Mike Bayer's message of 2015-05-11 15:44:30 -0700: > > On 5/11/15 5:25 PM, Robert Collins wrote: > > > > Details: Skip over this bit if you know it all already. > > > > The GIL plays a big factor here: if you want to scale the amount of > > CPU available to a Python service, you have two routes: > > A) move work to a different process through some RPC - be that DB's > > using SQL, other services using oslo.messaging or HTTP - whatever. > > B) use C extensions to perform work in threads - e.g. openssl context > > processing. > > > > To increase concurrency you can use threads, eventlet, asyncio, > > twisted etc - because within a single process *all* Python bytecode > > execution happens inside the GIL lock, so you get at most one CPU for > > a CPU bound workload. For an IO bound workload, you can fit more work > > in by context switching within that one CPU capacity. And - the GIL is > > a poor scheduler, so at the limit - an IO bound workload where the IO > > backend has more capacity than we have CPU to consume it within our > > process, you will run into priority inversion and other problems. > > [This varies by Python release too]. > > > > request_duration = time_in_cpu + time_blocked > > request_cpu_utilisation = time_in_cpu/request_duration > > cpu_utilisation = concurrency * request_cpu_utilisation > > > > Assuming that we don't want any one process to spend a lot of time at > > 100% - to avoid such at-the-limit issues, lets pick say 80% > > utilisation, or a safety factor of 0.2. If a single request consumes > > 50% of its duration waiting on IO, and 50% of its duration executing > > bytecode, we can only run one such request concurrently without > > hitting 100% utilisations. (2*0.5 CPU == 1). For a request that spends > > 75% of its duration waiting on IO and 25% on CPU, we can run 3 such > > requests concurrently without exceeding our target of 80% utilisation: > > (3*0.25=0.75). > > > > What we have today in our standard architecture for OpenStack is > > optimised for IO bound workloads: waiting on the > > network/subprocesses/disk/libvirt etc. Running high numbers of > > eventlet handlers in a single process only works when the majority of > > the work being done by a handler is IO. > > Everything stated here is great, however in our situation there is one > unfortunate fact which renders it completely incorrect at the moment. > I'm still puzzled why we are getting into deep think sessions about the > vagaries of the GIL and async when there is essentially a full-on > red-alert performance blocker rendering all of this discussion useless, > so I must again remind us: what we have *today* in Openstack is *as > completely un-optimized as you can possibly be*. > > The most GIL-heavy nightmare CPU bound task you can imagine running on > 25 threads on a ten year old Pentium will run better than the Openstack > we have today, because we are running a C-based, non-eventlet patched DB > library within a single OS thread that happens to use eventlet, but the > use of eventlet is totally pointless because right now it blocks > completely on all database IO. All production Openstack applications > today are fully serialized to only be able to emit a single query to the > database at a time; for each message sent, the entire application blocks > an order of magnitude more than it would under the GIL waiting for the > database library to send a message to MySQL, waiting for MySQL to send a > response including the full results, waiting for the database to unwrap > the response into Python structures, and finally back to the Python > space, where we can send another database message and block the entire > application and all greenlets while this single message proceeds. > > To share a link I've already shared about a dozen times here, here's > some tests under similar conditions which illustrate what that > concurrency looks like: > http://www.diamondtin.com/2014/sqlalchemy-gevent-mysql-python-drivers-comparison/. > > MySQLdb takes *20 times longer* to handle the work of 100 sessions than > PyMySQL when it's inappropriately run under gevent, when there is > modestly high concurrency happening. When I talk about moving to > threads, this is not a "won't help or hurt" kind of issue, at the moment > it's a change that will immediately allow massive improvement to the > performance of all Openstack applications instantly. We need to change > the DB library or dump eventlet. > > As far as if we should dump eventlet or use a pure-Python DB library, my > contention is that a thread based + C database library will outperform > an eventlet + Python-based database library. Additionally, if we make > either change, when we do so we may very well see all kinds of new > database-concurrency related bugs in our apps too, because we will be > talking to the database much more intensively all the sudden; it is my > opinion that a traditional threading model will be an easier environment > to handle working out the approach to these issues; we have to assume > "concurrency at any time" in any case because we run multiple instances > of Nova etc. at the same time. At the end of the day, we aren't going > to see wildly better performance with one approach over the other in any > case, so we should pick the one that is easier to develop, maintain, and > keep stable. >
Mike, I agree with the entire paragraph above, and I've been surprised to see the way this thread has gone with so much speculation. Optimization can be such a divisive thing, I think we need to be mindful of that. Anyway, there is additional thought that might change the decision a bit. There is one "pro" to changing to use pymsql vs. changing to use threads, and that is that it isolates the change to only database access. Switching to threading means introducing threads to every piece of code we might touch while multiple threads are active. It really seems worth it to see if I/O bound portions of OpenStack become more responsive with pymysql before embarking on a change to the concurrency model. If it doesn't, not much harm done, and if it does, but makes us CPU bound, well then we have even more of a reason to set out on such a large task. __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev