Hi Ben,

From the objdump-sed (%esp) command I realised that you guys might be using a XenServer DDK shipped with 6.2, which is 32bit. From Creedence (next release) onwards dom0 is 64bit, and so are the tools. http://xenserver.org/open-source-virtualization-download/24-product/creedence/143-xs-2014-development-snapshots.html

Thanks,
Anoob.
On 08/07/14 17:47, Ben Pfaff wrote:
I guess that the biggest effect on stack size would be the flow table
and in particular how much recursion flow processing causes.  There are
a few tests that force as-deep-as-possible recursion:

     AT_SETUP([ofproto-dpif - infinite resubmit])
     AT_SETUP([ofproto-dpif - exponential resubmit chain])

I don't think that forcing all packets to userspace would have much of
an effect.  (The closest equivalent would be to disable megaflows,
there's an "ovs-appctl" command for that, look in "ovs-appctl help".)

Another hint toward maximum stack requirement is to look through the
generated asm for stack usage, e.g.:

         objdump -dr vswitchd/ovs-vswitchd|sed -n 
's/^.*sub.*$0x\([0-9a-f]\{1,\}\),%esp/\1/p'|sort|uniq|less

which shows that we have at least one place where we allocate 327,788
bytes on the stack (!).  I hope that is not in the flow processing path!

On Tue, Jul 08, 2014 at 05:36:07PM +0100, Anoob Soman wrote:
I have been running tests with 1MB stack size and ovs-vswitchd seem
to hold pretty well. I will try to do some more experiments to find
out the max depth of the stack, but I am afraid this will totally
depend on the test I am running. Any suggestion on what sort of test
I should be running ? More over "force-miss-model" other-config is
missing from 2.1.x as there is no concept of facets. Is there way
that I can force all packets to be processed in userspace, other
than me doing "ovs-dpctl del-flows" periodically.

Thanks,
Anoob.
On 08/07/14 17:15, Ben Pfaff wrote:
On Tue, Jul 08, 2014 at 05:08:43PM +0100, Anoob Soman wrote:
Since openvswitch has moved to multi-threaded model, RSS usage of
ovs-vswitchd has increased quite significantly compared to the last
release we used (ovs-1.4.x). Part of the problem is using mlockall
(with MCL_CURRENT|MCL_FUTURE) on ovs-vswitchd, which causes every
pthreads stack's and heap's virtual address to locked to RAM.
ovs-vswitch (2.1.x) running on a 8 vCPU dom0 (10 pthreads) uses
around 89M of RSS (80MB just for stack), without any VMs running on
the host. One way to reduce RSS would be to reduce the number of
"n-handler-threads" and "n-revalidator-threads", but I am not sure
about the performance impact of having these thread numbers reduced.
I am wondering if the stack size of the pthreads can be reduce
(using pthread_attr_setstack). By default pthreads max stack size is
8MB and mlockall locks all of this 8MB into RAM. What could be
optimal stack size that can be used.
I think it would be very reasonable to reduce the stack sizes, but I
don't know the "correct" size off-hand.  Since you're looking at the
problem already, perhaps you should consider some experiments.

_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss

Reply via email to