Hi Ben,
Thanks for the suggestions. I did some quick test and analysis of the
stack usage on ovs-2.1.2 (planning to do it on master later) and here
are some of the findings.
The list shows the stack usage (in bytes) by each of the functions. I
have collected only those functions which uses more than 1KB of stack.
dpif_linux_operate 114536
updif_upcall_handler 70856
nl_sock_recv__ 65656
json_from_stream 8216
udpif_revalidator 7672
revalidator_sweep__ 6008
system_stats_thread_func 4872
netdev_linux_run 4408
dpif_linux_port_poll 4328
nln_run 4208
nl_sock_transact_multiple 3368
xlate_actions 3112
ofproto_trace 2760
handle_openflow__ 2072
do_xlate_actions 1928
handle_flow_stats_request 1784
ofp_print_nxst_flow_monitor_reply 1712
handle_aggregate_stats_request 1368
netdev_linux_sys_get_stats 1352
append_group_stats &
ofproto_dpif_execute_actions 1320
parse_odp_key_mask_attr 1304
xlate_group_bucket 1272
dpif_ipfix_cache_expire
& describe_fd 1256
dpif_linux_execute 1160
handle_flow_monitor_request 1112
sfl_agent_sysError 1080
handle_meter_mod 1048
sfl_agent_error 1032
As you can see there are few functions that are in packet processing path.
Assuming that we ran a "AT_SETUP([ofproto-dpif - infinite resubmit])"
test, which cause do_xlate_actions (and friends) to re-curse (64 levels
deep), roughly around 400KB of stack would be used by
"updif_upcall_handler". I think the stack usage of udpif_revalidator
should be same as that of updif_upcall_handler (if not less). I limited
the stack size of all the pthreads to 512KB and was able to run both the
tests you mentioned.
I tried valgrind (tool=massif) against ovs-vswitchd and ran
"AT_SETUP([ofproto-dpif - infinite resubmit])" test and valgrind
reported the max stack usage was around 400MB and
"AT_SETUP([ofproto-dpif - exponential resubmit chain])" uses around
700MB. This was with 4vCPUs (6 pthreads). valgrind though reports the
total stack usage by all threads.
This makes me believe that 1MB of stack size should be enough for each
of pthreads and 512KB would be tight. Let me know your thoughts. I will
send out a patch which would limit pthread stack size to 1024KB and
would it make it "other-config" configurable.
Thanks,
Anoob.
On 08/07/14 17:47, Ben Pfaff wrote:
I guess that the biggest effect on stack size would be the flow table
and in particular how much recursion flow processing causes. There are
a few tests that force as-deep-as-possible recursion:
AT_SETUP([ofproto-dpif - infinite resubmit])
I don't think that forcing all packets to userspace would have much of
an effect. (The closest equivalent would be to disable megaflows,
there's an "ovs-appctl" command for that, look in "ovs-appctl help".)
Another hint toward maximum stack requirement is to look through the
generated asm for stack usage, e.g.:
objdump -dr vswitchd/ovs-vswitchd|sed -n
's/^.*sub.*$0x\([0-9a-f]\{1,\}\),%esp/\1/p'|sort|uniq|less
which shows that we have at least one place where we allocate 327,788
bytes on the stack (!). I hope that is not in the flow processing path!
On Tue, Jul 08, 2014 at 05:36:07PM +0100, Anoob Soman wrote:
I have been running tests with 1MB stack size and ovs-vswitchd seem
to hold pretty well. I will try to do some more experiments to find
out the max depth of the stack, but I am afraid this will totally
depend on the test I am running. Any suggestion on what sort of test
I should be running ? More over "force-miss-model" other-config is
missing from 2.1.x as there is no concept of facets. Is there way
that I can force all packets to be processed in userspace, other
than me doing "ovs-dpctl del-flows" periodically.
Thanks,
Anoob.
On 08/07/14 17:15, Ben Pfaff wrote:
On Tue, Jul 08, 2014 at 05:08:43PM +0100, Anoob Soman wrote:
Since openvswitch has moved to multi-threaded model, RSS usage of
ovs-vswitchd has increased quite significantly compared to the last
release we used (ovs-1.4.x). Part of the problem is using mlockall
(with MCL_CURRENT|MCL_FUTURE) on ovs-vswitchd, which causes every
pthreads stack's and heap's virtual address to locked to RAM.
ovs-vswitch (2.1.x) running on a 8 vCPU dom0 (10 pthreads) uses
around 89M of RSS (80MB just for stack), without any VMs running on
the host. One way to reduce RSS would be to reduce the number of
"n-handler-threads" and "n-revalidator-threads", but I am not sure
about the performance impact of having these thread numbers reduced.
I am wondering if the stack size of the pthreads can be reduce
(using pthread_attr_setstack). By default pthreads max stack size is
8MB and mlockall locks all of this 8MB into RAM. What could be
optimal stack size that can be used.
I think it would be very reasonable to reduce the stack sizes, but I
don't know the "correct" size off-hand. Since you're looking at the
problem already, perhaps you should consider some experiments.
_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss