Hi, After checked gcc9.0.2, pragma is needed for notifying compiler to unroll the loop. Code will look like below after added compile-time macro for pragma. If this format is fine, I will send out patch set with the update later.
#ifdef SUPPORT_GCC_UNROLL_PRAGMA #define UNROLL_PRAGMA _Pragma("GCC unroll 4") #endif #ifdef SUPPORT_CLANG_UNROLL_PRAGMA #define UNROLL_PRAGMA _Pragma("unroll 4") #endif #ifdef SUPPORT_ICC_UNROLL_PRAGMA #define UNROLL_PRAGMA _Pragma("unroll (4)") #endif #ifndef UNROLL_PRAGMA #define UNROLL_PRAGMA _Pragma() #endif UNROLL_PRAGMA for (i = 0; i < PACKED_DESCS_BURST; i++) { if (unlikely(pkts[i]->next != NULL)) return -1; } Also checked compiler clang6.0.0, performance of small loop with pragma will be same as repeating code. Regards, Marvin > -----Original Message----- > From: Liu, Yong > Sent: Friday, September 06, 2019 9:43 AM > To: Ilya Maximets <i.maxim...@samsung.com>; Bie, Tiwei > <tiwei....@intel.com>; maxime.coque...@redhat.com > Cc: dev@dpdk.org > Subject: RE: [dpdk-dev] [PATCH v1 02/14] vhost: add burst enqueue function > for packed ring > > > > > -----Original Message----- > > From: Ilya Maximets [mailto:i.maxim...@samsung.com] > > Sent: Thursday, September 05, 2019 6:31 PM > > To: Liu, Yong <yong....@intel.com>; Bie, Tiwei <tiwei....@intel.com>; > > maxime.coque...@redhat.com; dev@dpdk.org > > Subject: Re: [dpdk-dev] [PATCH v1 02/14] vhost: add burst enqueue > function > > for packed ring > > > > On 05.09.2019 19:14, Marvin Liu wrote: > > > Burst enqueue function will first check whether descriptors are cache > > > aligned. It will also check prerequisites in the beginning. Burst > > > enqueue function not support chained mbufs, single packet enqueue > > > function will handle it. > > > > > > Signed-off-by: Marvin Liu <yong....@intel.com> > > > > Hi. > > > > Can we rely on loop unrolling by compiler instead of repeating each > > command 4 times? > > > > For example: > > > > uint64_t len[PACKED_DESCS_BURST]; > > > > for (i = 0; i < PACKED_DESCS_BURST; i++) > > len[i] = descs[avail_idx + i].len; > > > > > > For 'if's: > > > > res = false; > > for (i = 0; i < PACKED_DESCS_BURST; i++) > > res |= pkts[i]->next != NULL; > > if (unlikely(res)) > > return -1; > > > > or just > > > > for (i = 0; i < PACKED_DESCS_BURST; i++) > > if (unlikely(pkts[i]->next != NULL)) > > return -1; > > > > Since PACKED_DESCS_BURST is a fairly small constant, loops should be > > unrolled by compiler producing almost same code. > > > > This will significantly reduce code size and will also allow to > > play with PACKED_DESCS_BURST value without massive code changes. > > > > Same is applicable to other patches in the series. > > > > What do you think? > > > > Hi Ilya, > I did some test with the unroll availability of various compilers before. > All listed compilers will cause loopback performance drop compared to > repeating code version, especially GCC7.4 and ICC. > Newer compilers will have much less impact (around 3%) on the throughput. > If we can accept that, repeating code can be replaced with small loop > function. > > |----------------|---------------|-------------|------| > | Compiler | Auto unrolled | Fixed batch | Gap | > |----------------|---------------|-------------|------| > | Clang6.0.0 | 13.1M | 13.5M | 0.4M | > |----------------|---------------|-------------|------| > | GCC 8.3.0 | 13.9M | 14.4M | 0.5M | > |----------------|---------------|-------------|------| > | GCC 7.4.0 | 12.6M | 13.5M | 0.9M | > |----------------|---------------|-------------|------| > | ICC 19.0.4.243 | 11.0M | 12.3M | 1.3M | > |----------------|---------------|-------------|------| > > Thanks, > Marvin > > > Best regards, Ilya Maximets. > > > > >