Thanks Marvin, my comments inline: > -----Original Message----- > From: Liu, Yong <yong....@intel.com> > Sent: Wednesday, July 1, 2020 4:51 PM > To: Fu, Patrick <patrick...@intel.com>; dev@dpdk.org; > maxime.coque...@redhat.com; Xia, Chenbo <chenbo....@intel.com>; Wang, > Zhihong <zhihong.w...@intel.com> > Cc: Fu, Patrick <patrick...@intel.com>; Wang, Yinan > <yinan.w...@intel.com>; Jiang, Cheng1 <cheng1.ji...@intel.com>; Liang, > Cunming <cunming.li...@intel.com> > Subject: RE: [dpdk-dev] [PATCH v2 2/2] vhost: introduce async enqueue for > split ring > > > > > +#define VHOST_ASYNC_BATCH_THRESHOLD 8 > > + > > Not very clear about why batch number is 8. It is better to save it in > rte_vhost_async_features if the value come from hardware requirement. > We are in the progress of benchmarking how this value will have impact to the final performance, and we will have a more reasonable manner to handle this macro.
> > + > > +static __rte_noinline uint32_t > > +virtio_dev_rx_async_submit_split(struct virtio_net *dev, > > + struct vhost_virtqueue *vq, uint16_t queue_id, > > + struct rte_mbuf **pkts, uint32_t count) { > > + uint32_t pkt_idx = 0, pkt_burst_idx = 0; > > + uint16_t num_buffers; > > + struct buf_vector buf_vec[BUF_VECTOR_MAX]; > > + uint16_t avail_head, last_idx, shadow_idx; > > + > > + struct rte_vhost_iov_iter *it_pool = vq->it_pool; > > + struct iovec *vec_pool = vq->vec_pool; > > + struct rte_vhost_async_desc tdes[MAX_PKT_BURST]; > > + struct iovec *src_iovec = vec_pool; > > + struct iovec *dst_iovec = vec_pool + (VHOST_MAX_ASYNC_VEC >> 1); > > + struct rte_vhost_iov_iter *src_it = it_pool; > > + struct rte_vhost_iov_iter *dst_it = it_pool + 1; > > + uint16_t n_free_slot, slot_idx; > > + int n_pkts = 0; > > + > > + avail_head = *((volatile uint16_t *)&vq->avail->idx); > > + last_idx = vq->last_avail_idx; > > + shadow_idx = vq->shadow_used_idx; > > + > > + /* > > + * The ordering between avail index and > > + * desc reads needs to be enforced. > > + */ > > + rte_smp_rmb(); > > + > > + rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - > > +1)]); > > + > > + for (pkt_idx = 0; pkt_idx < count; pkt_idx++) { > > + uint32_t pkt_len = pkts[pkt_idx]->pkt_len + dev->vhost_hlen; > > + uint16_t nr_vec = 0; > > + > > + if (unlikely(reserve_avail_buf_split(dev, vq, > > + pkt_len, buf_vec, > > &num_buffers, > > + avail_head, &nr_vec) < 0)) { > > + VHOST_LOG_DATA(DEBUG, > > + "(%d) failed to get enough desc from > > vring\n", > > + dev->vid); > > + vq->shadow_used_idx -= num_buffers; > > + break; > > + } > > + > > + VHOST_LOG_DATA(DEBUG, "(%d) current index %d | end > > index %d\n", > > + dev->vid, vq->last_avail_idx, > > + vq->last_avail_idx + num_buffers); > > + > > + if (async_mbuf_to_desc(dev, vq, pkts[pkt_idx], > > + buf_vec, nr_vec, num_buffers, > > + src_iovec, dst_iovec, src_it, dst_it) < 0) { > > + vq->shadow_used_idx -= num_buffers; > > + break; > > + } > > + > > + slot_idx = (vq->async_pkts_idx + pkt_idx) & (vq->size - 1); > > + if (src_it->count) { > > + async_fill_des(&tdes[pkt_burst_idx], src_it, dst_it); > > + pkt_burst_idx++; > > + vq->async_pending_info[slot_idx] = > > + num_buffers | (src_it->nr_segs << 16); > > + src_iovec += src_it->nr_segs; > > + dst_iovec += dst_it->nr_segs; > > + src_it += 2; > > + dst_it += 2; > > Patrick, > In my understanding, nr_segs type definition can follow nr_vec type > definition (uint16_t). By that can short the data saved in async_pkts_pending > from 64bit to 32bit. > Since those information will be used in datapath, the smaller size will get > the > better perf. > > It is better to replace integer 2 with macro. > will update the code as you suggested. Thanks, Patrick