On Mon, Oct 10, 2016 at 04:16:19AM +0000, Wang, Zhihong wrote: > > > > -----Original Message----- > > From: Yuanhan Liu [mailto:yuanhan....@linux.intel.com] > > Sent: Monday, October 10, 2016 11:59 AM > > To: Michael S. Tsirkin <m...@redhat.com> > > Cc: Maxime Coquelin <maxime.coque...@redhat.com>; Stephen Hemminger > > <step...@networkplumber.org>; d...@dpdk.org; qemu- > > de...@nongnu.org; Wang, Zhihong <zhihong.w...@intel.com> > > Subject: Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature > > > > On Mon, Oct 10, 2016 at 06:46:44AM +0300, Michael S. Tsirkin wrote: > > > On Mon, Oct 10, 2016 at 11:37:44AM +0800, Yuanhan Liu wrote: > > > > On Thu, Sep 29, 2016 at 11:21:48PM +0300, Michael S. Tsirkin wrote: > > > > > On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote: > > > > > > > > > > > > > > > > > > On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote: > > > > > Yes but two points. > > > > > > > > > > 1. why is this memset expensive? > > > > > > > > I don't have the exact answer, but just some rough thoughts: > > > > > > > > It's an external clib function: there is a call stack and the > > > > IP register will bounch back and forth. > > > > > > for memset 0? gcc 5.3.1 on fedora happily inlines it. > > > > Good to know! > > > > > > overkill to use that for resetting 14 bytes structure. > > > > > > > > Some trick like > > > > *(struct virtio_net_hdr *)hdr = {0, }; > > > > > > > > Or even > > > > hdr->xxx = 0; > > > > hdr->yyy = 0; > > > > > > > > should behaviour better. > > > > > > > > There was an example: the vhost enqueue optmization patchset from > > > > Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC) > > > > on my Ivybridge server: it has no such issue on his server though. > > > > > > > > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html > > > > > > > > --yliu > > > > > > I'd say that's weird. what's your config? any chance you > > > are using an old compiler? > > > > Not really, it's gcc 5.3.1. Maybe Zhihong could explain more. IIRC, > > he said the memset is not well optimized for Ivybridge server. > > The dst is remote in that case. It's fine on Haswell but has complication > in Ivy Bridge which (wasn't supposed to but) causes serious frontend issue. > > I don't think gcc inlined it there. I'm using fc24 gcc 6.1.1.
So try something like this then: Signed-off-by: Michael S. Tsirkin <m...@redhat.com> diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h index dd7693f..7a3f88e 100644 --- a/drivers/net/virtio/virtio_pci.h +++ b/drivers/net/virtio/virtio_pci.h @@ -292,6 +292,16 @@ vtpci_with_feature(struct virtio_hw *hw, uint64_t bit) return (hw->guest_features & (1ULL << bit)) != 0; } +static inline int +vtnet_hdr_size(struct virtio_hw *hw) +{ + if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF) || + vtpci_with_feature(hw, VIRTIO_F_VERSION_1)) + return sizeof(struct virtio_net_hdr_mrg_rxbuf); + else + return sizeof(struct virtio_net_hdr); +} + /* * Function declaration from virtio_pci.c */ diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index a27208e..21a45e1 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -216,7 +216,7 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie, struct vring_desc *start_dp; uint16_t seg_num = cookie->nb_segs; uint16_t head_idx, idx; - uint16_t head_size = vq->hw->vtnet_hdr_size; + uint16_t head_size = vtnet_hdr_size(vq->hw); unsigned long offs; head_idx = vq->vq_desc_head_idx; Generally pointer chasing in vq->hw->vtnet_hdr_size can't be good for performance. Move fields used on data path into vq and use from there to avoid indirections?