On 2019/7/11 下午5:49, Liu, Yong wrote:
-----Original Message-----
From: Jason Wang [mailto:jasow...@redhat.com]
Sent: Thursday, July 11, 2019 12:11 PM
To: Liu, Yong <yong....@intel.com>; Bie, Tiwei <tiwei....@intel.com>;
maxime.coque...@redhat.com; dev@dpdk.org
Subject: Re: [dpdk-dev] [RFC PATCH 02/13] add vhost packed ring fast enqueue
function
On 2019/7/10 下午3:30, Liu, Yong wrote:
-----Original Message-----
From: Jason Wang [mailto:jasow...@redhat.com]
Sent: Wednesday, July 10, 2019 12:28 PM
To: Liu, Yong <yong....@intel.com>; Bie, Tiwei <tiwei....@intel.com>;
maxime.coque...@redhat.com; dev@dpdk.org
Subject: Re: [dpdk-dev] [RFC PATCH 02/13] add vhost packed ring fast
enqueue
function
On 2019/7/9 上午1:13, Marvin Liu wrote:
In fast enqueue function, will first check whether descriptors are
cache aligned. Fast enqueue function will check prerequisites in the
beginning. Fast enqueue function do not support chained mbufs, normal
function will handle that.
Signed-off-by: Marvin Liu <yong....@intel.com>
Any reason for not letting compiler to unroll the loops?
Hi Jason,
I'm not sure about how much compiler can help on unrolling loops as it
can't know how much loops will create in one call.
After force not using unroll-loop optimization by "-fno-unroll-loops",
virtio_dev_rx_packed function size remained the same.
So look like gcc unroll-loop optimization do not help here.
I meant something like "pragma GCC unroll N" just before the loop you
want unrolled.
Thanks
Hi Jason,
Just tired with gcc8.3.0 and master code, only 0.1Mpps performance gain with
"#pragma GCC unroll".
I think this compiler pragma is not helpful in the big loop which contained so
much functions.
Thanks,
Marvin
Yes, it probably need some trick e.g break the big loop into small ones.
What I want do here is unroll the loop based on
PACKED_DESC_PER_CACHELINE instead of a hard-coded 4.
Thanks
And fast enqueue function not only did unroll loop, it also checked cache
alignment which can help performance in another side.
Regards,
Marvin
Thanks