On 2015/09/05 1:50, Xie, Huawei wrote: > There is some format issue with the ascii chart of the tx ring. Update > that chart. > Sorry for the trouble.
Hi XIe, Thanks for sharing a way to optimize virtio. I have a few questions. > > On 9/4/2015 4:25 PM, Xie, Huawei wrote: >> Hi: >> >> Recently I have done one virtio optimization proof of concept. The >> optimization includes two parts: >> 1) avail ring set with fixed descriptors >> 2) RX vectorization >> With the optimizations, we could have several times of performance boost >> for purely vhost-virtio throughput. When you check performance, have you optimized only virtio-net driver? If so, can we optimize vhost backend(librte_vhost) also using your optimization way? >> >> Here i will only cover the first part, which is the prerequisite for the >> second part. >> Let us first take RX for example. Currently when we fill the avail ring >> with guest mbuf, we need >> a) allocate one descriptor(for non sg mbuf) from free descriptors >> b) set the idx of the desc into the entry of avail ring >> c) set the addr/len field of the descriptor to point to guest blank mbuf >> data area >> >> Those operation takes time, and especially step b results in modifed (M) >> state of the cache line for the avail ring in the virtio processing >> core. When vhost processes the avail ring, the cache line transfer from >> virtio processing core to vhost processing core takes pretty much CPU >> cycles. >> To solve this problem, this is the arrangement of RX ring for DPDK >> pmd(for non-mergable case). >> >> avail >> idx >> + >> | >> +----+----+---+-------------+------+ >> | 0 | 1 | 2 | ... | 254 | 255 | avail ring >> +-+--+-+--+-+-+---------+---+--+---+ >> | | | | | | >> | | | | | | >> v v v | v v >> +-+--+-+--+-+-+---------+---+--+---+ >> | 0 | 1 | 2 | ... | 254 | 255 | desc ring >> +----+----+---+-------------+------+ >> | >> | >> +----+----+---+-------------+------+ >> | 0 | 1 | 2 | | 254 | 255 | used ring >> +----+----+---+-------------+------+ >> | >> + >> Avail ring is initialized with fixed descriptor and is never changed, >> i.e, the index value of the nth avail ring entry is always n, which >> means virtio PMD is actually refilling desc ring only, without having to >> change avail ring. For example, avail ring is like below. struct vring_avail { uint16_t flags; uint16_t idx; uint16_t ring[QUEUE_SIZE]; }; My understanding is that virtio-net driver still needs to change avail_ring.idx, but don't need to change avail_ring.ring[]. Is this correct? Tetsuya >> When vhost fetches avail ring, if not evicted, it is always in its first >> level cache. >> >> When RX receives packets from used ring, we use the used->idx as the >> desc idx. This requires that vhost processes and returns descs from >> avail ring to used ring in order, which is true for both current dpdk >> vhost and kernel vhost implementation. In my understanding, there is no >> necessity for vhost net to process descriptors OOO. One case could be >> zero copy, for example, if one descriptor doesn't meet zero copy >> requirment, we could directly return it to used ring, earlier than the >> descriptors in front of it. >> To enforce this, i want to use a reserved bit to indicate in order >> processing of descriptors. >> >> For tx ring, the arrangement is like below. Each transmitted mbuf needs >> a desc for virtio_net_hdr, so actually we have only 128 free slots. >> >> >> >> >> ++ >> >> || >> >> || >> >> +-----+-----+-----+--------------+------+------+------+ >> >> | 0 | 1 | ... | 127 || 128 | 129 | ... | 255 | avail ring >> >> +--+--+--+--+-----+---+------+---+--+---+------+--+---+ >> >> | | | || | | | >> >> v v v || v v v >> >> +--+--+--+--+-----+---+------+---+--+---+------+--+---+ >> >> | 127 | 128 | ... | 255 || 127 | 128 | ... | 255 | desc ring for >> virtio_net_hdr >> +--+--+--+--+-----+---+------+---+--+---+------+--+---+ >> >> | | | || | | | >> >> v v v || v v v >> >> +--+--+--+--+-----+---+------+---+--+---+------+--+---+ >> >> | 0 | 1 | ... | 127 || 0 | 1 | ... | 127 | desc ring for >> tx dat >> >> >> >> /huawei >>