Thanks for the patch, the implementation looks good to me too. During testing I kept noticing that it's way too easy to make OVS completely unresponsive. As you point out in the documentation by having dpdk-lcore-mask the same as pmd-cpu-mask, OVS cannot even be killed (a kill -9 is required). I wonder what happens if one tries to set pmd-cpu-mask to every core in the system.
As a way to mitigate the risk perhaps we can avoid setting the main thread affinity to the first core in dpdk-lcore-mask by _always_ restoring it in dpdk_init__(), also if auto_determine is false. Perhaps we should start explicitly prohibiting creating a pmd thread on the first core in dpdk-lcore-mask (I get why previous version of this didn't do it on core 0. Perhaps we can generalize that to the first core in dpdk-lcore-mask). What's the behavior of other DPDK applications? Thanks, Daniele 2016-07-27 5:28 GMT-07:00 Kavanagh, Mark B <mark.b.kavan...@intel.com>: > > > >Set the DPDK pmd thread scheduling policy to SCHED_RR and static > >priority to highest priority value of the policy. This is to deal with > >pmd thread starvation case where another cpu hogging process can get > >scheduled/affinitized on to the same core the pmd thread is running > >there by significantly impacting the datapath performance. > > > >Setting the realtime scheduling policy to the pmd threads is one step > >towards Fastpath Service Assurance in OVS DPDK. > > > >The realtime scheduling policy is applied only when CPU mask is passed > >to 'pmd-cpu-mask'. For example: > > > > * In the absence of pmd-cpu-mask, one pmd thread shall be created > > and default scheduling policy and priority gets applied. > > > > * If pmd-cpu-mask is specified, one or more pmd threads shall be > > spawned on the corresponding core(s) in the mask and real time > > scheduling policy SCHED_RR and highest priority of the policy is > > applied to the pmd thread(s). > > > >To reproduce the pmd thread starvation case: > > > >ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6 > >taskset 0x2 cat /dev/zero > /dev/null & > > > >With this commit, it is recommended that the OVS control thread and pmd > >thread shouldn't be pinned to same core ('dpdk-lcore-mask','pmd-cpu-mask' > >should be non-overlapping). Also other processes with same affinity as > >PMD thread will be unresponsive. > > > >Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodire...@intel.com> > > LGTM - Acked-by: mark.b.kavan...@intel.com > > >--- > >v4->v5: > >* Reword Note section in DPDK-ADVANCED.md > > > >v3->v4: > >* Document update > >* Use ovs_strerror for reporting errors in lib-numa.c > > > >v2->v3: > >* Move set_priority() function to lib/ovs-numa.c > >* Apply realtime scheduling policy and priority to pmd thread only if > > pmd-cpu-mask is passed. > >* Update INSTALL.DPDK-ADVANCED. > > > >v1->v2: > >* Removed #ifdef and introduced dummy function "pmd_thread_setpriority" > > in netdev-dpdk.h > >* Rebase > > > > INSTALL.DPDK-ADVANCED.md | 17 +++++++++++++---- > > lib/dpif-netdev.c | 9 +++++++++ > > lib/ovs-numa.c | 18 ++++++++++++++++++ > > lib/ovs-numa.h | 1 + > > 4 files changed, 41 insertions(+), 4 deletions(-) > > > >diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md > >index 9ae536d..d76cb4e 100644 > >--- a/INSTALL.DPDK-ADVANCED.md > >+++ b/INSTALL.DPDK-ADVANCED.md > >@@ -205,8 +205,10 @@ needs to be affinitized accordingly. > > pmd thread is CPU bound, and needs to be affinitized to isolated > > cores for optimum performance. > > > >- By setting a bit in the mask, a pmd thread is created and pinned > >- to the corresponding CPU core. e.g. to run a pmd thread on core 2 > >+ By setting a bit in the mask, a pmd thread is created, pinned > >+ to the corresponding CPU core and the scheduling policy SCHED_RR > >+ along with maximum priority of the policy applied to the pmd thread. > >+ e.g. to pin a pmd thread on core 2 > > > > `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=4` > > > >@@ -234,8 +236,10 @@ needs to be affinitized accordingly. > > responsible for different ports/rxq's. Assignment of ports/rxq's to > > pmd threads is done automatically. > > > >- A set bit in the mask means a pmd thread is created and pinned > >- to the corresponding CPU core. e.g. to run pmd threads on core 1 and 2 > >+ A set bit in the mask means a pmd thread is created, pinned to the > >+ corresponding CPU core and the scheduling policy SCHED_RR with highest > >+ priority of the scheduling policy applied to pmd thread. > >+ e.g. to run pmd threads on core 1 and 2 > > > > `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6` > > > >@@ -246,6 +250,11 @@ needs to be affinitized accordingly. > > > > NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1 > > > >+ Note: It is recommended that the OVS control thread and pmd thread > >+ shouldn't be pinned to same core i.e 'dpdk-lcore-mask' and > 'pmd-cpu-mask' > >+ cpu mask settings should be non-overlapping. Also other processes with > >+ same affinity as PMD thread will be unresponsive. > >+ > > ### 4.3 DPDK physical port Rx Queues > > > > `ovs-vsctl set Interface <DPDK interface> options:n_rxq=<integer>` > >diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > >index f05ca4e..b85600b 100644 > >--- a/lib/dpif-netdev.c > >+++ b/lib/dpif-netdev.c > >@@ -2841,6 +2841,15 @@ pmd_thread_main(void *f_) > > ovs_numa_thread_setaffinity_core(pmd->core_id); > > dpdk_set_lcore_id(pmd->core_id); > > poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list); > >+ > >+ /* When cpu affinity mask explicitly set using pmd-cpu-mask, pmd > thread's > >+ * scheduling policy is set to SCHED_RR and the priority to highest > priority > >+ * of SCHED_RR policy. In the absence of pmd-cpu-mask, default > scheduling > >+ * policy and priority shall apply to pmd thread. > >+ */ > >+ if (pmd->dp->pmd_cmask) { > >+ ovs_numa_thread_setpriority(SCHED_RR); > >+ } > > reload: > > emc_cache_init(&pmd->flow_cache); > > > >diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c > >index c8173e0..428f274 100644 > >--- a/lib/ovs-numa.c > >+++ b/lib/ovs-numa.c > >@@ -613,3 +613,21 @@ int ovs_numa_thread_setaffinity_core(unsigned > core_id OVS_UNUSED) > > return EOPNOTSUPP; > > #endif /* __linux__ */ > > } > >+ > >+void > >+ovs_numa_thread_setpriority(int policy) > >+{ > >+ if (dummy_numa) { > >+ return; > >+ } > >+ > >+ struct sched_param threadparam; > >+ int err; > >+ > >+ memset(&threadparam, 0, sizeof(threadparam)); > >+ threadparam.sched_priority = sched_get_priority_max(policy); > >+ err = pthread_setschedparam(pthread_self(), policy, &threadparam); > >+ if (err) { > >+ VLOG_ERR("Thread priority error %s",ovs_strerror(err)); > >+ } > >+} > >diff --git a/lib/ovs-numa.h b/lib/ovs-numa.h > >index be836b2..94f0884 100644 > >--- a/lib/ovs-numa.h > >+++ b/lib/ovs-numa.h > >@@ -56,6 +56,7 @@ void ovs_numa_unpin_core(unsigned core_id); > > struct ovs_numa_dump *ovs_numa_dump_cores_on_numa(int numa_id); > > void ovs_numa_dump_destroy(struct ovs_numa_dump *); > > int ovs_numa_thread_setaffinity_core(unsigned core_id); > >+void ovs_numa_thread_setpriority(int policy); > > > > #define FOR_EACH_CORE_ON_NUMA(ITER, DUMP) \ > > LIST_FOR_EACH((ITER), list_node, &(DUMP)->dump) > >-- > >2.4.11 > > _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev