On 1/20/2016 2:33 AM, Polehn, Mike A wrote: > SMP operations can be very expensive, sometimes can impact operations by 100s > to 1000s of clock cycles depending on what is the circumstances of the > synchronization. It is how you arrange the SMP operations within the tasks at > hand across the SMP cores that gives methods for top performance. Using > traditional general purpose SMP methods will result in traditional general > purpose performance. Migrating to general libraries (understood by most > general purpose programmers) from expert abilities (understood by much > smaller group of expert programmers focused on performance) will greatly > reduce the value of DPDK since the end result will be lower performance > and/or have less predictable operation where rate performance, > predictability, and low latency are the primary goals. > > The best method to date, is to have multiple outputs to a single port is to > use a DPDK queue with multiple producer, single consumer to do an SMP > operation for multiple sources to feed a single non SMP task to output to the > port (that is why the ports are not SMP protected). Also when considerable > contention from multiple sources occur often (data feeding at same time), > having DPDK queue with input and output variables in separate cache lines > can have a notable throughput improvement. > > Mike
Mike: Thanks for detailed explanation. Do you have comment to this patch? > > -----Original Message----- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Xie, Huawei > Sent: Tuesday, January 19, 2016 8:44 AM > To: Tan, Jianfeng; dev at dpdk.org > Cc: ann.zhuangyanying at huawei.com > Subject: Re: [dpdk-dev] [PATCH] vhost: remove lockless enqueue to the virtio > ring > > On 1/20/2016 12:25 AM, Tan, Jianfeng wrote: >> Hi Huawei, >> >> On 1/4/2016 10:46 PM, Huawei Xie wrote: >>> This patch removes the internal lockless enqueue implmentation. >>> DPDK doesn't support receiving/transmitting packets from/to the same >>> queue. Vhost PMD wraps vhost device as normal DPDK port. DPDK >>> applications normally have their own lock implmentation when enqueue >>> packets to the same queue of a port. >>> >>> The atomic cmpset is a costly operation. This patch should help >>> performance a bit. >>> >>> Signed-off-by: Huawei Xie <huawei.xie at intel.com> >>> --- >>> lib/librte_vhost/vhost_rxtx.c | 86 >>> +++++++++++++------------------------------ >>> 1 file changed, 25 insertions(+), 61 deletions(-) >>> >>> diff --git a/lib/librte_vhost/vhost_rxtx.c >>> b/lib/librte_vhost/vhost_rxtx.c index bbf3fac..26a1b9c 100644 >>> --- a/lib/librte_vhost/vhost_rxtx.c >>> +++ b/lib/librte_vhost/vhost_rxtx.c >> I think vhost example will not work well with this patch when >> vm2vm=software. >> >> Test case: >> Two virtio ports handled by two pmd threads. Thread 0 polls pkts from >> physical NIC and sends to virtio0, while thread0 receives pkts from >> virtio1 and routes it to virtio0. > vhost port will be wrapped as port, by vhost PMD. DPDK APP treats all > physical and virtual ports as ports equally. When two DPDK threads try > to enqueue to the same port, the APP needs to consider the contention. > All the physical PMDs doesn't support concurrent enqueuing/dequeuing. > Vhost PMD should expose the same behavior unless absolutely necessary > and we expose the difference of different PMD. > >>> - >>> *(volatile uint16_t *)&vq->used->idx += entry_success; >> Another unrelated question: We ever try to move this assignment out of >> loop to save cost as it's a data contention? > This operation itself is not that costly, but it has side effect on the > cache transfer. > It is outside of the loop for non-mergable case. For mergeable case, it > is inside the loop. > Actually it has pro and cons whether we do this in burst or in a smaller > step. I prefer to move it outside of the loop. Let us address this later. > >> Thanks, >> Jianfeng >> >> >