Hi,
After checked gcc9.0.2, pragma is needed for notifying compiler to unroll the 
loop.
Code will look like below after added compile-time macro for pragma.
If this format is fine, I will send out patch set with the update later.

#ifdef SUPPORT_GCC_UNROLL_PRAGMA
#define UNROLL_PRAGMA _Pragma("GCC unroll 4")
#endif

#ifdef SUPPORT_CLANG_UNROLL_PRAGMA
#define UNROLL_PRAGMA _Pragma("unroll 4")
#endif

#ifdef SUPPORT_ICC_UNROLL_PRAGMA
#define UNROLL_PRAGMA _Pragma("unroll (4)")
#endif

#ifndef UNROLL_PRAGMA
#define UNROLL_PRAGMA _Pragma()
#endif

UNROLL_PRAGMA
for (i = 0; i < PACKED_DESCS_BURST; i++) {
        if (unlikely(pkts[i]->next != NULL))
                return -1;
}

Also checked compiler clang6.0.0, performance of small loop with pragma will be 
same as repeating code.

Regards,
Marvin

> -----Original Message-----
> From: Liu, Yong
> Sent: Friday, September 06, 2019 9:43 AM
> To: Ilya Maximets <i.maxim...@samsung.com>; Bie, Tiwei
> <tiwei....@intel.com>; maxime.coque...@redhat.com
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v1 02/14] vhost: add burst enqueue function
> for packed ring
> 
> 
> 
> > -----Original Message-----
> > From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> > Sent: Thursday, September 05, 2019 6:31 PM
> > To: Liu, Yong <yong....@intel.com>; Bie, Tiwei <tiwei....@intel.com>;
> > maxime.coque...@redhat.com; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v1 02/14] vhost: add burst enqueue
> function
> > for packed ring
> >
> > On 05.09.2019 19:14, Marvin Liu wrote:
> > > Burst enqueue function will first check whether descriptors are cache
> > > aligned. It will also check prerequisites in the beginning. Burst
> > > enqueue function not support chained mbufs, single packet enqueue
> > > function will handle it.
> > >
> > > Signed-off-by: Marvin Liu <yong....@intel.com>
> >
> > Hi.
> >
> > Can we rely on loop unrolling by compiler instead of repeating each
> > command 4 times?
> >
> > For example:
> >
> >     uint64_t len[PACKED_DESCS_BURST];
> >
> >     for (i = 0; i < PACKED_DESCS_BURST; i++)
> >         len[i] = descs[avail_idx + i].len;
> >
> >
> > For 'if's:
> >
> >     res = false;
> >     for (i = 0; i < PACKED_DESCS_BURST; i++)
> >         res |= pkts[i]->next != NULL;
> >     if (unlikely(res))
> >         return -1;
> >
> > or just
> >
> >     for (i = 0; i < PACKED_DESCS_BURST; i++)
> >         if (unlikely(pkts[i]->next != NULL))
> >             return -1;
> >
> > Since PACKED_DESCS_BURST is a fairly small constant, loops should be
> > unrolled by compiler producing almost same code.
> >
> > This will significantly reduce code size and will also allow to
> > play with PACKED_DESCS_BURST value without massive code changes.
> >
> > Same is applicable to other patches in the series.
> >
> > What do you think?
> >
> 
> Hi Ilya,
> I did some test with the unroll availability of various compilers before.
> All listed compilers will cause loopback performance drop compared to
> repeating code version, especially GCC7.4 and ICC.
> Newer compilers will have much less impact (around 3%) on the throughput.
> If we can accept that, repeating code can be replaced with small loop
> function.
> 
> |----------------|---------------|-------------|------|
> | Compiler       | Auto unrolled | Fixed batch | Gap  |
> |----------------|---------------|-------------|------|
> | Clang6.0.0     | 13.1M         | 13.5M       | 0.4M |
> |----------------|---------------|-------------|------|
> | GCC 8.3.0      | 13.9M         | 14.4M       | 0.5M |
> |----------------|---------------|-------------|------|
> | GCC 7.4.0      | 12.6M         | 13.5M       | 0.9M |
> |----------------|---------------|-------------|------|
> | ICC 19.0.4.243 | 11.0M         | 12.3M       | 1.3M |
> |----------------|---------------|-------------|------|
> 
> Thanks,
> Marvin
> 
> > Best regards, Ilya Maximets.
> >
> > >

Reply via email to