Hi Ben,

Thanks for the suggestions. I did some quick test and analysis of the stack usage on ovs-2.1.2 (planning to do it on master later) and here are some of the findings.

The list shows the stack usage (in bytes) by each of the functions. I have collected only those functions which uses more than 1KB of stack.
dpif_linux_operate             114536
updif_upcall_handler            70856
nl_sock_recv__                  65656
json_from_stream                8216
udpif_revalidator               7672
revalidator_sweep__             6008
system_stats_thread_func        4872
netdev_linux_run                4408
dpif_linux_port_poll            4328
nln_run                         4208
nl_sock_transact_multiple       3368
xlate_actions                   3112
ofproto_trace                   2760
handle_openflow__               2072
do_xlate_actions                1928
handle_flow_stats_request       1784
ofp_print_nxst_flow_monitor_reply 1712
handle_aggregate_stats_request  1368
netdev_linux_sys_get_stats      1352
append_group_stats &
ofproto_dpif_execute_actions    1320
parse_odp_key_mask_attr         1304
xlate_group_bucket              1272
dpif_ipfix_cache_expire
& describe_fd                   1256
dpif_linux_execute              1160
handle_flow_monitor_request     1112
sfl_agent_sysError              1080
handle_meter_mod                1048
sfl_agent_error                 1032

As you can see there are few functions that are in packet processing path.

Assuming that we ran a "AT_SETUP([ofproto-dpif - infinite resubmit])" test, which cause do_xlate_actions (and friends) to re-curse (64 levels deep), roughly around 400KB of stack would be used by "updif_upcall_handler". I think the stack usage of udpif_revalidator should be same as that of updif_upcall_handler (if not less). I limited the stack size of all the pthreads to 512KB and was able to run both the tests you mentioned.

I tried valgrind (tool=massif) against ovs-vswitchd and ran "AT_SETUP([ofproto-dpif - infinite resubmit])" test and valgrind reported the max stack usage was around 400MB and "AT_SETUP([ofproto-dpif - exponential resubmit chain])" uses around 700MB. This was with 4vCPUs (6 pthreads). valgrind though reports the total stack usage by all threads.

This makes me believe that 1MB of stack size should be enough for each of pthreads and 512KB would be tight. Let me know your thoughts. I will send out a patch which would limit pthread stack size to 1024KB and would it make it "other-config" configurable.

Thanks,
Anoob.

On 08/07/14 17:47, Ben Pfaff wrote:
I guess that the biggest effect on stack size would be the flow table
and in particular how much recursion flow processing causes.  There are
a few tests that force as-deep-as-possible recursion:

     AT_SETUP([ofproto-dpif - infinite resubmit])
I don't think that forcing all packets to userspace would have much of
an effect.  (The closest equivalent would be to disable megaflows,
there's an "ovs-appctl" command for that, look in "ovs-appctl help".)

Another hint toward maximum stack requirement is to look through the
generated asm for stack usage, e.g.:

         objdump -dr vswitchd/ovs-vswitchd|sed -n 
's/^.*sub.*$0x\([0-9a-f]\{1,\}\),%esp/\1/p'|sort|uniq|less

which shows that we have at least one place where we allocate 327,788
bytes on the stack (!).  I hope that is not in the flow processing path!

On Tue, Jul 08, 2014 at 05:36:07PM +0100, Anoob Soman wrote:
I have been running tests with 1MB stack size and ovs-vswitchd seem
to hold pretty well. I will try to do some more experiments to find
out the max depth of the stack, but I am afraid this will totally
depend on the test I am running. Any suggestion on what sort of test
I should be running ? More over "force-miss-model" other-config is
missing from 2.1.x as there is no concept of facets. Is there way
that I can force all packets to be processed in userspace, other
than me doing "ovs-dpctl del-flows" periodically.

Thanks,
Anoob.
On 08/07/14 17:15, Ben Pfaff wrote:
On Tue, Jul 08, 2014 at 05:08:43PM +0100, Anoob Soman wrote:
Since openvswitch has moved to multi-threaded model, RSS usage of
ovs-vswitchd has increased quite significantly compared to the last
release we used (ovs-1.4.x). Part of the problem is using mlockall
(with MCL_CURRENT|MCL_FUTURE) on ovs-vswitchd, which causes every
pthreads stack's and heap's virtual address to locked to RAM.
ovs-vswitch (2.1.x) running on a 8 vCPU dom0 (10 pthreads) uses
around 89M of RSS (80MB just for stack), without any VMs running on
the host. One way to reduce RSS would be to reduce the number of
"n-handler-threads" and "n-revalidator-threads", but I am not sure
about the performance impact of having these thread numbers reduced.
I am wondering if the stack size of the pthreads can be reduce
(using pthread_attr_setstack). By default pthreads max stack size is
8MB and mlockall locks all of this 8MB into RAM. What could be
optimal stack size that can be used.
I think it would be very reasonable to reduce the stack sizes, but I
don't know the "correct" size off-hand.  Since you're looking at the
problem already, perhaps you should consider some experiments.

_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss

Reply via email to