On Wed, Jun 01, 2016 at 06:40:41AM +0000, Xie, Huawei wrote: > > /* Retrieve all of the head indexes first to avoid caching issues. */ > > for (i = 0; i < count; i++) { > > - desc_indexes[i] = vq->avail->ring[(vq->last_used_idx + i) & > > - (vq->size - 1)]; > > + used_idx = (vq->last_used_idx + i) & (vq->size - 1); > > + desc_indexes[i] = vq->avail->ring[used_idx]; > > + > > + vq->used->ring[used_idx].id = desc_indexes[i]; > > + vq->used->ring[used_idx].len = 0; > > + vhost_log_used_vring(dev, vq, > > + offsetof(struct vring_used, ring[used_idx]), > > + sizeof(vq->used->ring[used_idx])); > > } > > > > /* Prefetch descriptor index. */ > > rte_prefetch0(&vq->desc[desc_indexes[0]]); > > - rte_prefetch0(&vq->used->ring[vq->last_used_idx & (vq->size - 1)]); > > - > > for (i = 0; i < count; i++) { > > int err; > > > > - if (likely(i + 1 < count)) { > > + if (likely(i + 1 < count)) > > rte_prefetch0(&vq->desc[desc_indexes[i + 1]]); > > - rte_prefetch0(&vq->used->ring[(used_idx + 1) & > > - (vq->size - 1)]); > > - } > > > > pkts[i] = rte_pktmbuf_alloc(mbuf_pool); > > if (unlikely(pkts[i] == NULL)) { > > @@ -916,18 +920,12 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, > > rte_pktmbuf_free(pkts[i]); > > break; > > } > > - > > - used_idx = vq->last_used_idx++ & (vq->size - 1); > > - vq->used->ring[used_idx].id = desc_indexes[i]; > > - vq->used->ring[used_idx].len = 0; > > - vhost_log_used_vring(dev, vq, > > - offsetof(struct vring_used, ring[used_idx]), > > - sizeof(vq->used->ring[used_idx])); > > } > > Had tried post-updating used ring in batch, but forget the perf change.
I would assume pre-updating gives better performance gain, as we are fiddling with avail and used ring together, which would be more cache friendly. > One optimization would be on vhost_log_used_ring. > I have two ideas, > a) In QEMU side, we always assume use ring will be changed. so that we > don't need to log used ring in VHOST. > > Michael: feasible in QEMU? comments on this? > > b) We could always mark the total used ring modified rather than entry > by entry. I doubt it's worthwhile. One fact is that vhost_log_used_ring is a non operation in most time: it will take action only in the short gap of during live migration. And FYI, I even tried with all vhost_log_xxx being removed, it showed no performance boost at all. Therefore, it's not a factor that will impact performance. --yliu