I get some checkpatch warnings, and build fails with clang. Could you please fix these issues and send v9?
Thanks, Maxime ### [PATCH] vhost: try to unroll for each loop WARNING:CAMELCASE: Avoid CamelCase: <_Pragma> #78: FILE: lib/librte_vhost/vhost.h:47: +#define vhost_for_each_try_unroll(iter, val, size) _Pragma("GCC unroll 4") \ ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parenthesis #78: FILE: lib/librte_vhost/vhost.h:47: +#define vhost_for_each_try_unroll(iter, val, size) _Pragma("GCC unroll 4") \ + for (iter = val; iter < size; iter++) ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parenthesis #83: FILE: lib/librte_vhost/vhost.h:52: +#define vhost_for_each_try_unroll(iter, val, size) _Pragma("unroll 4") \ + for (iter = val; iter < size; iter++) ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parenthesis #88: FILE: lib/librte_vhost/vhost.h:57: +#define vhost_for_each_try_unroll(iter, val, size) _Pragma("unroll (4)") \ + for (iter = val; iter < size; iter++) total: 3 errors, 1 warnings, 67 lines checked 0/1 valid patch/tmp/dpdk_build/lib/librte_vhost/virtio_net.c:2065:1: error: unused function 'free_zmbuf' [-Werror,-Wunused-function] free_zmbuf(struct vhost_virtqueue *vq) ^ 1 error generated. make[5]: *** [virtio_net.o] Error 1 make[4]: *** [librte_vhost] Error 2 make[4]: *** Waiting for unfinished jobs.... make[3]: *** [lib] Error 2 make[2]: *** [all] Error 2 make[1]: *** [pre_install] Error 2 make: *** [install] Error 2 On 10/22/19 12:08 AM, Marvin Liu wrote: > Packed ring has more compact ring format and thus can significantly > reduce the number of cache miss. It can lead to better performance. > This has been approved in virtio user driver, on normal E5 Xeon cpu > single core performance can raise 12%. > > http://mails.dpdk.org/archives/dev/2018-April/095470.html > > However vhost performance with packed ring performance was decreased. > Through analysis, mostly extra cost was from the calculating of each > descriptor flag which depended on ring wrap counter. Moreover, both > frontend and backend need to write same descriptors which will cause > cache contention. Especially when doing vhost enqueue function, virtio > refill packed ring function may write same cache line when vhost doing > enqueue function. This kind of extra cache cost will reduce the benefit > of reducing cache misses. > > For optimizing vhost packed ring performance, vhost enqueue and dequeue > function will be splitted into fast and normal path. > > Several methods will be taken in fast path: > Handle descriptors in one cache line by batch. > Split loop function into more pieces and unroll them. > Prerequisite check that whether I/O space can copy directly into mbuf > space and vice versa. > Prerequisite check that whether descriptor mapping is successful. > Distinguish vhost used ring update function by enqueue and dequeue > function. > Buffer dequeue used descriptors as many as possible. > Update enqueue used descriptors by cache line. > > After all these methods done, single core vhost PvP performance with 64B > packet on Xeon 8180 can boost 35%. > > v8: > - Allocate mbuf by virtio_dev_pktmbuf_alloc > > v7: > - Rebase code > - Rename unroll macro and definitions > - Calculate flags when doing single dequeue > > v6: > - Fix dequeue zcopy result check > > v5: > - Remove disable sw prefetch as performance impact is small > - Change unroll pragma macro format > - Rename shadow counter elements names > - Clean dequeue update check condition > - Add inline functions replace of duplicated code > - Unify code style > > v4: > - Support meson build > - Remove memory region cache for no clear performance gain and ABI break > - Not assume ring size is power of two > > v3: > - Check available index overflow > - Remove dequeue remained descs number check > - Remove changes in split ring datapath > - Call memory write barriers once when updating used flags > - Rename some functions and macros > - Code style optimization > > v2: > - Utilize compiler's pragma to unroll loop, distinguish clang/icc/gcc > - Buffered dequeue used desc number changed to (RING_SZ - PKT_BURST) > - Optimize dequeue used ring update when in_order negotiated > > > Marvin Liu (13): > vhost: add packed ring indexes increasing function > vhost: add packed ring single enqueue > vhost: try to unroll for each loop > vhost: add packed ring batch enqueue > vhost: add packed ring single dequeue > vhost: add packed ring batch dequeue > vhost: flush enqueue updates by cacheline > vhost: flush batched enqueue descs directly > vhost: buffer packed ring dequeue updates > vhost: optimize packed ring enqueue > vhost: add packed ring zcopy batch and single dequeue > vhost: optimize packed ring dequeue > vhost: optimize packed ring dequeue when in-order > > lib/librte_vhost/Makefile | 18 + > lib/librte_vhost/meson.build | 7 + > lib/librte_vhost/vhost.h | 57 ++ > lib/librte_vhost/virtio_net.c | 948 +++++++++++++++++++++++++++------- > 4 files changed, 837 insertions(+), 193 deletions(-) >