On 03/05/2014 06:42 AM, Miguel Angel Ajo wrote:
Hello,
Recently, I found a serious issue about network-nodes startup time,
neutron-rootwrap eats a lot of cpu cycles, much more than the processes
it's wrapping itself.
On a database with 1 public network, 192 private networks, 192
routers, and 192 nano VMs, with OVS plugin:
Network node setup time (rootwrap): 24 minutes
Network node setup time (sudo): 10 minutes
I've not been looking at rootwrap, but have been looking at sudo and ip.
(Using some scripts which create "fake routers" so I could look without
any of this icky OpenStack stuff in the way :) ) The Ubuntu 12.04
versions of each at least will enumerate all the interfaces on the
system, even though they don't need to.
There was already an upstream change to 'ip' that eliminates the
unnecessary enumeration. In the last few weeks an enhancement went into
the upstream sudo that allows one to configure sudo to not do the same
thing. Down in the low(ish) three figures of interfaces it may not be
a Big Deal (tm) but as one starts to go beyond that...
commit f0124b0f0aa0e5b9288114eb8e6ff9b4f8c33ec8
Author: Stephen Hemminger <step...@networkplumber.org>
Date: Thu Mar 28 15:17:47 2013 -0700
ip: remove unnecessary ll_init_map
Don't call ll_init_map on modify operations
Saves significant overhead with 1000's of devices.
http://www.sudo.ws/pipermail/sudo-workers/2014-January/000826.html
Whether your environment already has the 'ip' change I don't know, but
odd are probably pretty good it doesn't have the sudo enhancement.
That's the time since you reboot a network node, until all namespaces
and services are restored.
So, that includes the time for the system to go down and reboot, not
just the time it takes to rebuild once rebuilding starts?
If you see appendix "1", this extra 14min overhead, matches with the
fact that rootwrap needs 0.3s to start, and launch a system command
(once filtered).
14minutes = 840 s.
(840s. / 192 resources)/0.3s ~= 15 operations /
resource(qdhcp+qrouter) (iptables, ovs port creation & tagging, starting
child processes, etc..)
The overhead comes from python startup time + rootwrap loading.
How much of the time is python startup time? I assume that would be all
the "find this lib, find that lib" stuff one sees in a system call
trace? I saw a boatload of that at one point but didn't quite feel like
wading into that at the time.
I suppose that rootwrap was designed for lower amount of system
calls (nova?).
And/or a smaller environment perhaps.
And, I understand what rootwrap provides, a level of filtering that
sudo cannot offer. But it raises some question:
1) It's actually someone using rootwrap in production?
2) What alternatives can we think about to improve this situation.
0) already being done: coalescing system calls. But I'm unsure
that's enough. (if we coalesce 15 calls to 3 on this system we get:
192*3*0.3/60 ~=3 minutes overhead on a 10min operation).
It may not be sufficient, but it is (IMO) certainly necessary. It will
make any work that minimizes or eliminates the overhead of rootwrap look
that much better.
a) Rewriting rules into sudo (to the extent that it's possible), and
live with that.
b) How secure is neutron about command injection to that point? How
much is user input filtered on the API calls?
c) Even if "b" is ok , I suppose that if the DB gets compromised,
that could lead to command injection.
d) Re-writing rootwrap into C (it's 600 python LOCs now).
e) Doing the command filtering at neutron-side, as a library and
live with sudo with simple filtering. (we kill the python/rootwrap
startup overhead).
3) I also find 10 minutes a long time to setup 192 networks/basic tenant
structures, I wonder if that time could be reduced by conversion
of system process calls into system library calls (I know we don't have
libraries for iproute, iptables?, and many other things... but it's a
problem that's probably worth looking at.)
Certainly going back and forth creating short-lived processes is at
least anti-social and perhaps ever so slightly upsetting to the process
scheduler. Particularly "at scale." The/a problem is though that the
Linux networking folks have been somewhat reticent about creating
libraries (at least any that they would end-up supporting) because they
have a concern it will lock-in interfaces and reduce their freedom of
movement.
happy benchmarking,
rick jones
the fastest procedure call is the one you never make
Best,
Miguel Ángel Ajo.
Appendix:
[1] Analyzing overhead:
[root@rhos4-neutron2 ~]# echo "int main() { return 0; }" > test.c
[root@rhos4-neutron2 ~]# gcc test.c -o test
[root@rhos4-neutron2 ~]# time test # to time process invocation on
this machine
real 0m0.000s
user 0m0.000s
sys 0m0.000s
[root@rhos4-neutron2 ~]# time sudo bash -c 'exit 0'
real 0m0.032s
user 0m0.010s
sys 0m0.019s
[root@rhos4-neutron2 ~]# time python -c'import sys;sys.exit(0)'
real 0m0.057s
user 0m0.016s
sys 0m0.011s
[root@rhos4-neutron2 ~]# time neutron-rootwrap --help
/usr/bin/neutron-rootwrap: No command specified
real 0m0.309s
user 0m0.128s
sys 0m0.037s
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev