On 03/02/2012 10:54 AM, Day, Phil wrote:
By "properly multi-threaded" are you instead referring to making the nova-api
server multi-*processed* with eventlet greenthread pools in each process? i.e. The way
Swift (and now Glance) works? Or are you referring to a different approach entirely?
Yep - following your posting in here pointing to the glance changes we
back-ported that into the Diablo API server. We're now running each API
server with 20 OS processes and 20 EC2 processes, and the world looks a lot
happier.
Gotcha, OK, that makes a lot of sense.
> The same changes were being done in parallel into Essex by someone in
the community I thought ?
Hmmm, for Nova? I'm not aware of that effort, but I would certainly
support it. It's a very big impact performance issue...
Curious... do you have a list of all the places where sleep(0) calls were
inserted in the HP Nova code? I can turn that into a bug report and get to work
on adding them...
So far the only two cases we've done this are in the _sync_power_state and in
the security group refresh handling
(libvirt/firewall/do_refresh_security_group_rules) - which we modified to only
refresh for instances in the group and added a sleep in the loop (I need to
finish writing the bug report for this one).
OK, sounds good.
I have contemplated doing something similar in the image code when reading
chunks from glance - but am slightly worried that in this case the only thing
that currently stops two creates for the same image from making separate
requests to glance might be that one gets queued behind the other. It would be
nice to do the same thing on snapshot (as this can also be a real hog), but
there the transfer is handled completely within the glance client. A more
radical approach would be to split out the image handling code from compute
manager into a separate (co-hosted) image_manager so at least only commands
which need interaction with glance will block each other.
We should definitely discuss this further (separate ML thread or
etherpad maybe). If not before the design summit, then definitely at it.
Cheers!
-jay
Phil
-----Original Message-----
From: openstack-bounces+philip.day=hp....@lists.launchpad.net
[mailto:openstack-bounces+philip.day=hp....@lists.launchpad.net] On Behalf Of
Jay Pipes
Sent: 02 March 2012 15:17
To: openstack@lists.launchpad.net
Subject: Re: [Openstack] eventlet weirdness
On 03/02/2012 05:34 AM, Day, Phil wrote:
In our experience (running clusters of several hundred nodes) the DB
performance is not generally the significant factor, so making its calls
non-blocking gives only a very small increase in processing capacity and
creates other side effects in terms of slowing all eventlets down as they wait
for their turn to run.
Yes, I believe I said that this was the case at the last design summit
-- or rather, I believe I said "is there any evidence that the database is a
performance or scalability problem at all"?
That shouldn't really be surprising given that the Nova DB is pretty small and
MySQL is a pretty good DB - throw reasonable hardware at the DB server and give
it a bit of TLC from a DBA (remove deleted entries from the DB, add indexes
where the slow query log tells you to, etc) and it shouldn't be the bottleneck
in the system for performance or scalability.
++
We use the python driver and have experimented with allowing the eventlet code
to make the db calls non-blocking (its not the default setting), and it works,
but didn't give us any significant advantage.
Yep, identical results to the work that Mark Washenberger did on the same
subject.
For example in the API server (before we made it properly
multi-threaded)
By "properly multi-threaded" are you instead referring to making the nova-api
server multi-*processed* with eventlet greenthread pools in each process? i.e. The way
Swift (and now Glance) works? Or are you referring to a different approach entirely?
> with blocking db calls the server was essentially a serial processing
queue - each request was fully processed before the next. With non-blocking db
calls we got a lot more apparent concurrencybut only at the expense of making all
of the requests equally bad.
Yep, not surprising.
Consider a request takes 10 seconds, where after 5 seconds there is a call to
the DB which takes 1 second, and three are started at the same time:
Blocking:
0 - Request 1 starts
10 - Request 1 completes, request 2 starts
20 - Request 2 completes, request 3 starts
30 - Request 3 competes
Request 1 completes in 10 seconds
Request 2 completes in 20 seconds
Request 3 completes in 30 seconds
Ave time: 20 sec
Non-blocking
0 - Request 1 Starts
5 - Request 1 gets to db call, request 2 starts
10 - Request 2 gets to db call, request 3 starts
15 - Request 3 gets to db call, request 1 resumes
19 - Request 1 completes, request 2 resumes
23 - Request 2 completes, request 3 resumes
27 - Request 3 completes
Request 1 completes in 19 seconds (+ 9 seconds) Request 2 completes
in 24 seconds (+ 4 seconds) Request 3 completes in 27 seconds (- 3
seconds) Ave time: 20 sec
So instead of worrying about making db calls non-blocking we've been working to
make certain eventlets non-blocking - i.e. add sleep(0) calls to long running
iteration loops - which IMO has a much bigger impact on the performance of the
apparent latency of the system.
Yep, and I think adding a few sleep(0) calls in various places in the Nova
codebase (as was recently added in the _sync_power_states() periodic task) is
an easy and simple win with pretty much no ill side-effects. :)
Curious... do you have a list of all the places where sleep(0) calls were
inserted in the HP Nova code? I can turn that into a bug report and get to work
on adding them...
All the best,
-jay
Phil
-----Original Message-----
From: openstack-bounces+philip.day=hp....@lists.launchpad.net
[mailto:openstack-bounces+philip.day=hp....@lists.launchpad.net] On
Behalf Of Brian Lamar
Sent: 01 March 2012 21:31
To: openstack@lists.launchpad.net
Subject: Re: [Openstack] eventlet weirdness
How is MySQL access handled in eventlet? Presumably it's external C
library so it's not going to be monkey patched. Does that make every
db access call a blocking call? Thanks,
Nope, it goes through a thread pool.
I feel like this might be an over-simplification. If the question is:
"How is MySQL access handled in nova?"
The answer would be that we use SQLAlchemy which can load any number of
SQL-drivers. These drivers can be either pure Python or C-based drivers. In the
case of pure Python drivers, monkey patching can occur and db calls are
non-blocking. In the case of drivers which contain C code (or perhaps other
blocking calls), db calls will most likely be blocking.
If the question is "How is MySQL access handled in eventlet?" the answer would
be to use the eventlet.db_pool module to allow db access using thread pools.
B
-----Original Message-----
From: "Adam Young"<ayo...@redhat.com>
Sent: Thursday, March 1, 2012 3:27pm
To: openstack@lists.launchpad.net
Subject: Re: [Openstack] eventlet weirdness
On 03/01/2012 02:45 PM, Yun Mao wrote:
There are plenty eventlet discussion recently but I'll stick my
question to this thread, although it's pretty much a separate
question. :)
How is MySQL access handled in eventlet? Presumably it's external C
library so it's not going to be monkey patched. Does that make every
db access call a blocking call? Thanks,
Nope, it goes through a thread pool.
Yun
On Wed, Feb 29, 2012 at 9:18 PM, Johannes Erdfelt<johan...@erdfelt.com>
wrote:
On Wed, Feb 29, 2012, Yun Mao<yun...@gmail.com> wrote:
Thanks for the explanation. Let me see if I understand this.
1. Eventlet will never have this problem if there is only 1 OS
thread
-- let's call it main thread.
In fact, that's exactly what Python calls it :)
2. In Nova, there is only 1 OS thread unless you use xenapi and/or
the virt/firewall driver.
3. The python logging module uses locks. Because of the monkey
patch, those locks are actually eventlet or "green" locks and may
trigger a green thread context switch.
Based on 1-3, does it make sense to say that in the other OS
threads (i.e. not main thread), if logging (plus other pure python
library code involving locking) is never used, and we do not run a
eventlet hub at all, we should never see this problem?
That should be correct. I'd have to double check all of the monkey
patching that eventlet does to make sure there aren't other cases
where you may inadvertently use eventlet primitives across real threads.
JE
_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp
_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp
_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp
_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp
_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp
_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp
_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp