On 5/11/15 9:58 AM, Attila Fazekas wrote:
----- Original Message -----
From: "John Garbutt" <j...@johngarbutt.com>
To: "OpenStack Development Mailing List (not for usage questions)"
<openstack-dev@lists.openstack.org>
Cc: "Dan Smith" <d...@danplanet.com>
Sent: Saturday, May 9, 2015 12:45:26 PM
Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 30 April 2015 at 18:54, Mike Bayer <mba...@redhat.com> wrote:
On 4/30/15 11:16 AM, Dan Smith wrote:
There is an open discussion to replace mysql-python with PyMySQL, but
PyMySQL has worse performance:
https://wiki.openstack.org/wiki/PyMySQL_evaluation
My major concern with not moving to something different (i.e. not based
on the C library) is the threading problem. Especially as we move in the
direction of cellsv2 in nova, not blocking the process while waiting for
a reply from mysql is going to be critical. Further, I think that we're
likely to get back a lot of performance from a supports-eventlet
database connection because of the parallelism that conductor currently
can only provide in exchange for the footprint of forking into lots of
workers.
If we're going to move, shouldn't we be looking at something that
supports our threading model?
yes, but at the same time, we should change our threading model at the
level
of where APIs are accessed to refer to a database, at the very least using
a
threadpool behind eventlet. CRUD-oriented database access is faster using
traditional threads, even in Python, than using an eventlet-like system or
using explicit async. The tests at
http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/
show this. With traditional threads, we can stay on the C-based MySQL
APIs and take full advantage of their speed.
Sorry to go back in time, I wanted to go back to an important point.
It seems we have three possible approaches:
* C lib and eventlet, blocks whole process
* pure python lib, and eventlet, eventlet does its thing
* go for a C lib and dispatch calls via thread pool
* go with pure C protocol lib, which explicitly using `python patch-able`
I/O function (Maybe others like.: threading, mutex, sleep ..)
* go with pure C protocol lib and the python part explicitly call
for `decode` and `encode`, the C part just do CPU intensive operations,
and it never calls for I/O primitives .
We have a few problems:
* performance sucks, we have to fork lots of nova-conductors and api nodes
* need to support python2.7 and 3.4, but its not currently possible
with the lib we use?
* want to pick a lib that we can fix when there are issues, and work to
improve
It sounds like:
* currently do the first one, it sucks, forking nova-conductor helps
* seems we are thinking the second one might work, we sure get py3.4 +
py2.7 support
* the last will mean more work, but its likely to be more performant
* worried we are picking a unsupported lib with little future
I am leaning towards us moving to making DB calls with a thread pool
and some fast C based library, so we get the 'best' performance.
Is that a crazy thing to be thinking? What am I missing here?
Using the python socket from C code:
https://github.com/esnme/ultramysql/blob/master/python/io_cpython.c#L100
Also possible to implement a mysql driver just as a protocol parser,
and you are free to use you favorite event based I/O strategy (direct epoll
usage)
even without eventlet (or similar).
The issue with ultramysql, it does not implements
the `standard` python DB API, so you would need to add an extra wrapper to
SQLAlchemy.
This driver appears to have seen its last commit about a year ago, that
doesn't even implement the standard DBAPI (which is already a red
flag). There is apparently a separately released (!) DBAPI-compat
wrapper https://pypi.python.org/pypi/umysqldb/1.0.3 which has had no
releases in two years. If this wrapper is indeed compatible with
MySQLdb then it would run in SQLAlchemy without changes (though I'd be
extremely surprised if it passes our test suite).
How would using these obscure libraries be any preferable than running
Nova API functions within the thread-pooling facilities already included
with eventlet ? Keeping in mind that I've now done the work [1]
to show that there is no performance gain to be had for all the trouble
we go through to use eventlet/gevent/asyncio with local database
connections.
[1] http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev