[dpdk-dev] [PATCH v2] vhost: flush used->idx update before reading avail->flags

2015-05-15 Thread Nikita Kalyazin
Hi,


Maybe I missed a part of the discussion, but is there any special purpose for 
using rte_mb (both read and write fence) here rather than rte_wmb (write fence 
only)?

-- 

Best regards,

Nikita Kalyazin,
n.kalyazin at samsung.com

Software Engineer
CE OS Group
Samsung R&D Institute Russia
Tel: +7 (495) 797-25-00 #3816
Tel: +7 (495) 797-25-03
Office #1501, 12-1, Dvintsev str.,
Moscow, 127018, Russia

On Wed, May 13, 2015 at 12:46:30PM +0200, Thomas Monjalon wrote:
> 2015-04-29 19:11, Huawei Xie:
> > update of used->idx and read of avail->flags could be reordered.
> > memory fence should be used to ensure the order, otherwise guest could see 
> > a stale used->idx value after it toggles the interrupt suppression flag.
> > After guest sets the interrupt suppression flag, it will check if there is 
> > more buffer to process through used->idx. If it sees a stale value, it will 
> > exit the processing while host willn't send interrupt to guest.
> > 
> > Signed-off-by: Huawei Xie 
> 
> Applied with following title, thanks
>   vhost: fix virtio freeze due to missed interrupt
> 


[dpdk-dev] [PATCH v2] vhost: flush used->idx update before reading avail->flags

2015-05-18 Thread Nikita Kalyazin
Ah, sorry. I looked at it without the context. Thanks.

-- 

Best regards,

Nikita Kalyazin,
n.kalyazin at samsung.com

Software Engineer
CE OS Group
Samsung R&D Institute Russia
Tel: +7 (495) 797-25-00 #3816
Tel: +7 (495) 797-25-03
Office #1501, 12-1, Dvintsev str.,
Moscow, 127018, Russia

On Fri, May 15, 2015 at 05:23:35PM +0200, Michael S. Tsirkin wrote:
> On Fri, May 15, 2015 at 04:43:33PM +0300, Nikita Kalyazin wrote:
> > Hi,
> > 
> > 
> > Maybe I missed a part of the discussion, but is there any special purpose 
> > for using rte_mb (both read and write fence) here rather than rte_wmb 
> > (write fence only)?
> 
> The fence is between write of used->idx and read of avail->flags, so
> rte_wmb won't do anything useful.
> 
> > -- 
> > 
> > Best regards,
> > 
> > Nikita Kalyazin,
> > n.kalyazin at samsung.com
> > 
> > Software Engineer
> > CE OS Group
> > Samsung R&D Institute Russia
> > Tel: +7 (495) 797-25-00 #3816
> > Tel: +7 (495) 797-25-03
> > Office #1501, 12-1, Dvintsev str.,
> > Moscow, 127018, Russia
> > 
> > On Wed, May 13, 2015 at 12:46:30PM +0200, Thomas Monjalon wrote:
> > > 2015-04-29 19:11, Huawei Xie:
> > > > update of used->idx and read of avail->flags could be reordered.
> > > > memory fence should be used to ensure the order, otherwise guest could 
> > > > see a stale used->idx value after it toggles the interrupt suppression 
> > > > flag.
> > > > After guest sets the interrupt suppression flag, it will check if there 
> > > > is more buffer to process through used->idx. If it sees a stale value, 
> > > > it will exit the processing while host willn't send interrupt to guest.
> > > > 
> > > > Signed-off-by: Huawei Xie 
> > > 
> > > Applied with following title, thanks
> > >   vhost: fix virtio freeze due to missed interrupt
> > > 


[dpdk-dev] data copy in vhost-user

2015-04-27 Thread Nikita Kalyazin
Hi,


As far as I understand, DPDK vhost-user implementation requires data copy for 
either RX or TX (rte_vhost_dequeue_burst() and rte_vhost_enqueue_burst()). It 
means that two data copies are needed to transfer a packet from one VM to 
another.

Why is not it possible to eliminate one of the copies (e.g., 
rte_vhost_enqueue_burst() might set up a reference at vring descriptor to 
mbuf's data rather than copying the data)?

-- 

Best regards,

Nikita Kalyazin,
n.kalyazin at samsung.com

CE OS Group
Samsung R&D Institute Russia
Tel: +7 (495) 797-25-00 #3816
Tel: +7 (495) 797-25-03
Office #1501, 12-1, Dvintsev str.,
Moscow, 127018, Russia



[dpdk-dev] data copy in vhost-user

2015-04-29 Thread Nikita Kalyazin
Zoltan, Huawei,

Thank you for your replies.

-- 

Best regards,

Nikita Kalyazin,
n.kalyazin at samsung.com

CE OS Group
Samsung R&D Institute Russia
Tel: +7 (495) 797-25-00 #3816
Tel: +7 (495) 797-25-03
Office #1501, 12-1, Dvintsev str.,
Moscow, 127018, Russia

On Tue, Apr 28, 2015 at 01:24:25PM +0100, Zoltan Kiss wrote:
> 
> 
> On 28/04/15 02:22, Xie, Huawei wrote:
> >
> >
> >> -Original Message-
> >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Zoltan Kiss
> >> Sent: Tuesday, April 28, 2015 12:27 AM
> >> To: Nikita Kalyazin; dev at dpdk.org
> >> Subject: Re: [dpdk-dev] data copy in vhost-user
> >>
> >>
> >>
> >> On 27/04/15 12:54, Nikita Kalyazin wrote:
> >>> Hi,
> >>>
> >>>
> >>> As far as I understand, DPDK vhost-user implementation requires data copy
> >> for either RX or TX (rte_vhost_dequeue_burst() and
> >> rte_vhost_enqueue_burst()). It means that two data copies are needed to
> >> transfer a packet from one VM to another.
> >>>
> >>> Why is not it possible to eliminate one of the copies (e.g.,
> >> rte_vhost_enqueue_burst() might set up a reference at vring descriptor to
> >> mbuf's data rather than copying the data)?
> > This had been added to the to-do list. We could delay the copy until the 
> > real copy is needed.
> >>
> >> I'm just guessing, but in case of VM-to-VM traffic the receiving one
> >> could hold onto the buffer indefinitely, preventing the sender to reuse
> >> the buffer. That could lead to a DoS in some cases, and shutting down
> >> the sender would be also tricky. At least in case of Xen
> >> netback/netfront that's the reason. A reasonable solution for this
> >> problem is to make sure the buffer is swapped out with a copy after a
> >> finite time.
> > Do you mean we associate a timeout for the buffer?
> Yes, I think xen-netback had such a version once, but it was removed. As 
> far as I know the overhead and complexity of handling these timeouts 
> were too severe.
> I might be wrong about this, I don't know if this problem applies here 
> as well or not.
> 
> >>
> >> Regards,
> >>
> >> Zoltan


[dpdk-dev] [PATCH v5 resend 07/12] virtio: resolve for control queue

2015-10-08 Thread Nikita Kalyazin
Hi Yuanhan,


As I understand, the dead loop happened here (virtio_send_command):
while (vq->vq_used_cons_idx == vq->vq_ring.used->idx) {
  rte_rmb();
  usleep(100);
}

Could you explain why wrong config reading caused that and how correct reading 
helps to avoid?

-- 

Best regards,

Nikita Kalyazin,
n.kalyazin at samsung.com

Software Engineer
Virtualization Group
Samsung R&D Institute Russia
Tel: +7 (495) 797-25-00 #3816
Tel: +7 (495) 797-25-03
Office #1501, 12-1, Dvintsev str.,
Moscow, 127018, Russia

On Mon, Sep 21, 2015 at 02:36:47PM +0800, Yuanhan Liu wrote:
> On Sun, Sep 20, 2015 at 12:21:14PM +0300, Michael S. Tsirkin wrote:
> > On Fri, Sep 18, 2015 at 11:10:56PM +0800, Yuanhan Liu wrote:
> > > From: Changchun Ouyang 
> > > 
> > > Fix the max virtio queue pair read issue.
> > > 
> > > Control queue can't work for vhost-user mulitple queue mode,
> > > so introduce a counter to void the dead loop when polling
> > > the control queue.
> > > 
> > > Signed-off-by: Changchun Ouyang 
> > > Signed-off-by: Yuanhan Liu 
> > 
> > Per virtio spec, the multiqueue feature depends on control queue -
> > what do you mean when you say it can't work?
> > 
> > > ---
> > >  drivers/net/virtio/virtio_ethdev.c | 12 +++-
> > >  1 file changed, 7 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/drivers/net/virtio/virtio_ethdev.c 
> > > b/drivers/net/virtio/virtio_ethdev.c
> > > index 465d3cd..b2f4120 100644
> > > --- a/drivers/net/virtio/virtio_ethdev.c
> > > +++ b/drivers/net/virtio/virtio_ethdev.c
> > > @@ -1162,7 +1162,6 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
> > >   struct virtio_hw *hw = eth_dev->data->dev_private;
> > >   struct virtio_net_config *config;
> > >   struct virtio_net_config local_config;
> > > - uint32_t offset_conf = sizeof(config->mac);
> > >   struct rte_pci_device *pci_dev;
> > >  
> > >   RTE_BUILD_BUG_ON(RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr));
> > > @@ -1222,7 +1221,9 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
> > >   config = &local_config;
> > >  
> > >   if (vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
> > > - offset_conf += sizeof(config->status);
> > > + vtpci_read_dev_config(hw,
> > > + offsetof(struct virtio_net_config, status),
> > > + &config->status, sizeof(config->status));
> > >   } else {
> > >   PMD_INIT_LOG(DEBUG,
> > >"VIRTIO_NET_F_STATUS is not supported");
> > > @@ -1230,15 +1231,16 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
> > >   }
> > >  
> > >   if (vtpci_with_feature(hw, VIRTIO_NET_F_MQ)) {
> > > - offset_conf += sizeof(config->max_virtqueue_pairs);
> > > + vtpci_read_dev_config(hw,
> > > + offsetof(struct virtio_net_config, 
> > > max_virtqueue_pairs),
> > > + &config->max_virtqueue_pairs,
> > > + sizeof(config->max_virtqueue_pairs));
> > >   } else {
> > >   PMD_INIT_LOG(DEBUG,
> > >"VIRTIO_NET_F_MQ is not supported");
> > >   config->max_virtqueue_pairs = 1;
> > >   }
> > >  
> > > - vtpci_read_dev_config(hw, 0, (uint8_t *)config, offset_conf);
> > > -
> > >   hw->max_rx_queues =
> > >   (VIRTIO_MAX_RX_QUEUES < config->max_virtqueue_pairs) ?
> > >   VIRTIO_MAX_RX_QUEUES : config->max_virtqueue_pairs;
> > 
> > 
> > Does the patch actually do what the commit log says?
> 
> Sorry, the commit log is wrong as you said.
> 
> It was actually a bug in our code, which happens to be revealed when
> MQ is enabled. The old code adjusts the config bytes we want to read
> depending on what kind of features we have, but we later cast the
> entire buf we read with "struct virtio_net_config", which is obviously
> wrong.
> 
> The right way to go is to read related config bytes when corresponding
> feature is set, which is exactly what this patch does.
> 
> > It seems tobe about reading the device confing,
> > not breaking out of a loop ...
> 
> It's just a (bad) side effect of getting the vritio_net_config wrongly:
> the wrong config causes a dead loop in our code.
> 
> And sorry for the buggy commit log, will fix it next version.
> 
> Thanks.
> 
>   --yliu


[dpdk-dev] [PATCH v5 resend 07/12] virtio: resolve for control queue

2015-10-09 Thread Nikita Kalyazin
Hi,

> I just recognized that this dead loop is the same one that I have 
> experienced (see 
> http://dpdk.org/ml/archives/dev/2015-October/024737.html for reference). 
> Just applying the changes in this patch (only 07/12) will not fix the 
> dead loop at least in my setup.
Yes, exactly. I observe it same way even after applying the patch.

-- 

Best regards,

Nikita Kalyazin,
n.kalyazin at samsung.com

Software Engineer
Virtualization Group
Samsung R&D Institute Russia
Tel: +7 (495) 797-25-00 #3816
Tel: +7 (495) 797-25-03
Office #1501, 12-1, Dvintsev str.,
Moscow, 127018, Russia

On Thu, Oct 08, 2015 at 10:51:02PM +0200, Steffen Bauch wrote:
> 
> 
> On 10/08/2015 05:32 PM, Nikita Kalyazin wrote:
> > Hi Yuanhan,
> >
> >
> > As I understand, the dead loop happened here (virtio_send_command):
> > while (vq->vq_used_cons_idx == vq->vq_ring.used->idx) {
> >rte_rmb();
> >usleep(100);
> > }
> >
> > Could you explain why wrong config reading caused that and how correct 
> > reading helps to avoid?
> >
> Hi,
> 
> I just recognized that this dead loop is the same one that I have 
> experienced (see 
> http://dpdk.org/ml/archives/dev/2015-October/024737.html for reference). 
> Just applying the changes in this patch (only 07/12) will not fix the 
> dead loop at least in my setup.
> 
> Best regards,
> 
> Steffen


[dpdk-dev] [PATCH] vhost: call write barrier before used index update

2015-10-20 Thread Nikita Kalyazin
Descriptors that have been put into the used vring must be observable by
guest earlier than the new used index value.
Although compiler barrier serves well for Intel architectue here, the
proper cross-platform solution is to use write barrier before the used
index is updated.

Signed-off-by: Nikita Kalyazin 
---
 lib/librte_vhost/vhost_rxtx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 7026bfa..d955287 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -216,7 +216,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
}
}

-   rte_compiler_barrier();
+   rte_wmb();

/* Wait until it's our turn to add our buffer to the used ring. */
while (unlikely(vq->last_used_idx != res_base_idx))
@@ -512,7 +512,7 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t 
queue_id,
entry_success = copy_from_mbuf_to_vring(dev, res_base_idx,
res_cur_idx, pkts[pkt_idx]);

-   rte_compiler_barrier();
+   rte_wmb();

/*
 * Wait until it's our turn to add our buffer
@@ -751,7 +751,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
entry_success++;
}

-   rte_compiler_barrier();
+   rte_wmb();
vq->used->idx += entry_success;
/* Kick guest if required. */
if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
-- 
2.5.3



[dpdk-dev] [PATCH] vhost: call write barrier before used index update

2015-10-21 Thread Nikita Kalyazin
Hi,

> This has been discussed a number of times before on list, and the consensus
> seems to be that the correct way to fix this is to introduce a set of specific
> barrier operations that insert the correct barrier type on each architecture,
> i.e. compiler barriers on IA, and full wmbs on architectures that require 
> that.
Linux kernel contains two sets of macros: *mb() and smp_*mb().  As far as
I understand, the former are meant to order memory accesses when interacting
with peripherals (physical NICs in our case), and the latter - to provide
synchronization between CPU cores (applicable for virtual NICs in our case).
smp_*mb() for Intel architecture would be a simple compiler barrier, whereas
for processors with weaker memory model they may use real barrier
instructions.
Maybe implementing barriers similar way would work in DPDK as well?

-- 

Best regards,

Nikita Kalyazin,
n.kalyazin at samsung.com

Software Engineer
Virtualization Group
Samsung R&D Institute Russia
Tel: +7 (495) 797-25-00 #3816
Tel: +7 (495) 797-25-03
Office #1501, 12-1, Dvintsev str.,
Moscow, 127018, Russia

On Tue, Oct 20, 2015 at 03:29:51PM +0100, Bruce Richardson wrote:
> On Tue, Oct 20, 2015 at 05:07:46PM +0300, Nikita Kalyazin wrote:
> > Descriptors that have been put into the used vring must be observable by
> > guest earlier than the new used index value.
> > Although compiler barrier serves well for Intel architectue here, the
> > proper cross-platform solution is to use write barrier before the used
> > index is updated.
> > 
> > Signed-off-by: Nikita Kalyazin 
> > ---
> Yes, but no! :-)
> 
> This has been discussed a number of times before on list, and the consensus
> seems to be that the correct way to fix this is to introduce a set of specific
> barrier operations that insert the correct barrier type on each architecture,
> i.e. compiler barriers on IA, and full wmbs on architectures that require 
> that.
> 
> See discussion here: http://dpdk.org/dev/patchwork/patch/4293/
> and in the thread here: http://dpdk.org/ml/archives/dev/2015-March/015202.html
> 
> So correct problem statment, but unfortunately NAK for the implementation.
> 
> Patches for general memory barrier implementation as described above welcome 
> :-)
> 
> Regards,
> /Bruce