> -----Original Message-----
> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> Sent: Thursday, September 05, 2019 6:31 PM
> To: Liu, Yong <yong....@intel.com>; Bie, Tiwei <tiwei....@intel.com>;
> maxime.coque...@redhat.com; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v1 02/14] vhost: add burst enqueue function
> for packed ring
> 
> On 05.09.2019 19:14, Marvin Liu wrote:
> > Burst enqueue function will first check whether descriptors are cache
> > aligned. It will also check prerequisites in the beginning. Burst
> > enqueue function not support chained mbufs, single packet enqueue
> > function will handle it.
> >
> > Signed-off-by: Marvin Liu <yong....@intel.com>
> 
> Hi.
> 
> Can we rely on loop unrolling by compiler instead of repeating each
> command 4 times?      
> 
> For example:
> 
>     uint64_t len[PACKED_DESCS_BURST];
> 
>     for (i = 0; i < PACKED_DESCS_BURST; i++)
>         len[i] = descs[avail_idx + i].len;
> 
> 
> For 'if's:
> 
>     res = false;
>     for (i = 0; i < PACKED_DESCS_BURST; i++)
>         res |= pkts[i]->next != NULL;
>     if (unlikely(res))
>         return -1;
> 
> or just
> 
>     for (i = 0; i < PACKED_DESCS_BURST; i++)
>         if (unlikely(pkts[i]->next != NULL))
>             return -1;
> 
> Since PACKED_DESCS_BURST is a fairly small constant, loops should be
> unrolled by compiler producing almost same code.
> 
> This will significantly reduce code size and will also allow to
> play with PACKED_DESCS_BURST value without massive code changes.
> 
> Same is applicable to other patches in the series.
> 
> What do you think?
> 
Hi Ilya,
I did some test with the unroll availability of various compilers before.
All listed compilers will cause loopback performance drop compared to repeating 
code version, especially GCC7.4 and ICC. 
Newer compilers will have much less impact (around 3%) on the throughput.
If we can accept that, repeating code can be replaced with small loop function.

|----------------|---------------|-------------|------|
| Compiler       | Auto unrolled | Fixed batch | Gap  |
|----------------|---------------|-------------|------|
| Clang6.0.0     | 13.1M         | 13.5M       | 0.4M |
|----------------|---------------|-------------|------|
| GCC 8.3.0      | 13.9M         | 14.4M       | 0.5M |
|----------------|---------------|-------------|------|
| GCC 7.4.0      | 12.6M         | 13.5M       | 0.9M |
|----------------|---------------|-------------|------|
| ICC 19.0.4.243 | 11.0M         | 12.3M       | 1.3M |
|----------------|---------------|-------------|------|

Thanks,
Marvin

> Best regards, Ilya Maximets.
> 
> >

Reply via email to