On Wed, Jun 01, 2016 at 06:40:41AM +0000, Xie, Huawei wrote:
> >     /* Retrieve all of the head indexes first to avoid caching issues. */
> >     for (i = 0; i < count; i++) {
> > -           desc_indexes[i] = vq->avail->ring[(vq->last_used_idx + i) &
> > -                                   (vq->size - 1)];
> > +           used_idx = (vq->last_used_idx + i) & (vq->size - 1);
> > +           desc_indexes[i] = vq->avail->ring[used_idx];
> > +
> > +           vq->used->ring[used_idx].id  = desc_indexes[i];
> > +           vq->used->ring[used_idx].len = 0;
> > +           vhost_log_used_vring(dev, vq,
> > +                           offsetof(struct vring_used, ring[used_idx]),
> > +                           sizeof(vq->used->ring[used_idx]));
> >     }
> >  
> >     /* Prefetch descriptor index. */
> >     rte_prefetch0(&vq->desc[desc_indexes[0]]);
> > -   rte_prefetch0(&vq->used->ring[vq->last_used_idx & (vq->size - 1)]);
> > -
> >     for (i = 0; i < count; i++) {
> >             int err;
> >  
> > -           if (likely(i + 1 < count)) {
> > +           if (likely(i + 1 < count))
> >                     rte_prefetch0(&vq->desc[desc_indexes[i + 1]]);
> > -                   rte_prefetch0(&vq->used->ring[(used_idx + 1) &
> > -                                                 (vq->size - 1)]);
> > -           }
> >  
> >             pkts[i] = rte_pktmbuf_alloc(mbuf_pool);
> >             if (unlikely(pkts[i] == NULL)) {
> > @@ -916,18 +920,12 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
> >                     rte_pktmbuf_free(pkts[i]);
> >                     break;
> >             }
> > -
> > -           used_idx = vq->last_used_idx++ & (vq->size - 1);
> > -           vq->used->ring[used_idx].id  = desc_indexes[i];
> > -           vq->used->ring[used_idx].len = 0;
> > -           vhost_log_used_vring(dev, vq,
> > -                           offsetof(struct vring_used, ring[used_idx]),
> > -                           sizeof(vq->used->ring[used_idx]));
> >     }
> 
> Had tried post-updating used ring in batch,  but forget the perf change.

I would assume pre-updating gives better performance gain, as we are
fiddling with avail and used ring together, which would be more cache
friendly.

> One optimization would be on vhost_log_used_ring.
> I have two ideas,
> a) In QEMU side, we always assume use ring will be changed. so that we
> don't need to log used ring in VHOST.
> 
> Michael: feasible in QEMU? comments on this?
> 
> b) We could always mark the total used ring modified rather than entry
> by entry.

I doubt it's worthwhile. One fact is that vhost_log_used_ring is
a non operation in most time: it will take action only in the short
gap of during live migration.

And FYI, I even tried with all vhost_log_xxx being removed, it showed
no performance boost at all. Therefore, it's not a factor that will
impact performance.

        --yliu

Reply via email to