On Tue, Nov 17, 2015 at 5:23 AM, Yuanhan Liu <yuanhan.liu at linux.intel.com> wrote:
> On Thu, Nov 12, 2015 at 01:46:03PM -0800, Rich Lane wrote: > > You can reproduce this with l2fwd and the vhost PMD. > > > > You'll need this patch on top of the vhost PMD patches: > > --- a/lib/librte_vhost/virtio-net.c > > +++ b/lib/librte_vhost/virtio-net.c > > @@ -471,7 +471,7 @@ reset_owner(struct vhost_device_ctx ctx) > > return -1; > > > > if (dev->flags & VIRTIO_DEV_RUNNING) > > - notify_ops->destroy_device(dev); > > + notify_destroy_device(dev); > > > > cleanup_device(dev); > > reset_device(dev); > > > > 1. Start l2fwd on the host: l2fwd -l 0,1 --vdev eth_null --vdev > > eth_vhost0,iface=/run/vhost0.sock -- -p3 > > 2. Start a VM using vhost-user and set up uio, hugepages, etc. > > 3. Start l2fwd inside the VM: l2fwd -l 0,1 --vdev eth_null -- -p3 > > 4. Kill the l2fwd inside the VM with SIGINT. > > 5. Start l2fwd inside the VM. > > 6. l2fwd on the host crashes. > > > > I found the source of the memory corruption by setting a watchpoint in > > gdb: watch -l rte_eth_devices[1].data->rx_queues > > Rich, > > Thanks for the detailed steps for reproducing this issue, and sorry for > being a bit late: I finally got the time to dig this issue today. > > Put simply, buffer overflow is not the root cause, but the fact "we do > not release resource on stop/exit" is. > > And here is how the issue comes. After step 4 (terminating l2fwd), neither > the l2fwd nor the virtio pmd driver does some resource release. Hence, > l2fwd at HOST will not notice such chage, still trying to receive and > queue packets to the vhost dev. It's not an issue as far as we don't > start l2fwd again, for there is actaully no packets to forward, and > rte_vhost_dequeue_burst returns from: > > 596 avail_idx = *((volatile uint16_t *)&vq->avail->idx); > 597 > 598 /* If there are no available buffers then return. */ > 599 if (vq->last_used_idx == avail_idx) > 600 return 0; > > But just at the init stage while starting l2fwd (step 5), > rte_eal_memory_init() > resets all huge pages memory to zero, resulting all vq->desc[] items > being reset to zero, which in turn ends up with secure_len being set > with 0 at return. > > (BTW, I'm not quite sure why the inside VM huge pages memory reset > would results to vq->desc reset). > > The vq desc reset reuslts to a dead loop at virtio_dev_merge_rx(), > as update_secure_len() keeps setting secure_len with 0: > > 511 do { > 512 avail_idx = *((volatile uint16_t > *)&vq->avail->idx); > 513 if (unlikely(res_cur_idx == avail_idx)) > { > 514 LOG_DEBUG(VHOST_DATA, > 515 "(%"PRIu64") Failed " > 516 "to get enough desc > from " > 517 "vring\n", > 518 dev->device_fh); > 519 goto merge_rx_exit; > 520 } else { > 521 update_secure_len(vq, > res_cur_idx, &secure_len, &vec_idx); > 522 res_cur_idx++; > 523 } > 524 } while (pkt_len > secure_len); > > The dead loop causes vec_idx keep increasing then, and overflows > quickly, leading to the crash in the end as you saw. > > So, the following would resolve this issue, in a right way (I > guess), and it's for virtio-pmd and l2fwd only so far. > > --- > diff --git a/drivers/net/virtio/virtio_ethdev.c > b/drivers/net/virtio/virtio_ethdev.c > index 12fcc23..8d6bf56 100644 > --- a/drivers/net/virtio/virtio_ethdev.c > +++ b/drivers/net/virtio/virtio_ethdev.c > @@ -1507,9 +1507,12 @@ static void > virtio_dev_stop(struct rte_eth_dev *dev) > { > struct rte_eth_link link; > + struct virtio_hw *hw = dev->data->dev_private; > > PMD_INIT_LOG(DEBUG, "stop"); > > + vtpci_reset(hw); > + > if (dev->data->dev_conf.intr_conf.lsc) > rte_intr_disable(&dev->pci_dev->intr_handle); > > diff --git a/examples/l2fwd/main.c b/examples/l2fwd/main.c > index 720fd5a..565f648 100644 > --- a/examples/l2fwd/main.c > +++ b/examples/l2fwd/main.c > @@ -44,6 +44,7 @@ > #include <ctype.h> > #include <errno.h> > #include <getopt.h> > +#include <signal.h> > > #include <rte_common.h> > #include <rte_log.h> > @@ -534,14 +535,40 @@ check_all_ports_link_status(uint8_t port_num, > uint32_t port_mask) > } > } > > +static uint8_t nb_ports; > +static uint8_t nb_ports_available; > + > +/* When we receive a INT signal, unregister vhost driver */ > +static void > +sigint_handler(__rte_unused int signum) > +{ > + uint8_t portid; > + > + for (portid = 0; portid < nb_ports; portid++) { > + /* skip ports that are not enabled */ > + if ((l2fwd_enabled_port_mask & (1 << portid)) == 0) { > + printf("Skipping disabled port %u\n", (unsigned) > portid); > + nb_ports_available--; > + continue; > + } > + > + /* stopping port */ > + printf("Stopping port %u... ", (unsigned) portid); > + fflush(stdout); > + rte_eth_dev_stop(portid); > + > + printf("done: \n"); > + } > + > + exit(0); > +} > + > int > main(int argc, char **argv) > { > struct lcore_queue_conf *qconf; > struct rte_eth_dev_info dev_info; > int ret; > - uint8_t nb_ports; > - uint8_t nb_ports_available; > uint8_t portid, last_port; > unsigned lcore_id, rx_lcore_id; > unsigned nb_ports_in_mask = 0; > @@ -688,6 +715,8 @@ main(int argc, char **argv) > /* initialize port stats */ > memset(&port_statistics, 0, sizeof(port_statistics)); > } > + signal(SIGINT, sigint_handler); > + > > if (!nb_ports_available) { > rte_exit(EXIT_FAILURE, > > > ---- > > And if you rethink this issue twice, you will find it's neither a > vhost-pmd nor l2fwd specific issue. I could easy reproduce it with > vhost-switch and virtio testpmd combo. The reason behind that would > be same: we don't release/stop the resources at stop. > > It's kind of a known issue so far, and it's on Zhihong (cc'ed) TODO > list to handle them correctly in next release. > > --yliu Thanks for looking into this. I agree with your description of the root cause, it's what I was referring to when I mentioned that the virtqueue memory is zeroed when the guest app is restarted. Agreed that it's not specific to l2fwd/vhost PMD. When the guest zeroes the avail virtqueue idx it goes backwards from the perspective of the host. The host then loops up to 2^16 times until res_cur_idx == avail_idx, overflowing the buf_vec array after the first 256 iterations. No real packet TX is needed. I don't think that adding a SIGINT handler is the right solution, though. The guest app could be killed with another signal (SIGKILL). Worse, a malicious or buggy guest could write to just that field. vhost should not crash no matter what the guest writes into the virtqueues.