Hi Huawei, thank you for the quick response and for the pointer to the 16.04-rc1 version. Nice!
I think it would be great also to have a sanity check on the gpa_to_vva(). Although nothing recent has hit it we had some problems in that area in the past. Regards, Patrik On 03/17/2016 02:35 AM, Xie, Huawei wrote: > On 3/16/2016 8:53 PM, Patrik Andersson R wrote: >> Hello, >> >> When taking a snapshot of a running VM instance, using OpenStack >> "nova image-create", I noticed that one OVS pmd-thread eventually >> failed in DPDK rte_vhost_dequeue_burst() with repeating log entries: >> >> compute-0-6 ovs-vswitchd[38172]: VHOST_DATA: Failed to allocate >> memory for mbuf. >> >> >> Debugging (data included further down) this issue lead to the >> observation that there is no protection against malformed vhost >> queue descriptors, thus tenant separation might be violated as a >> single faulty VM might bring down the connectivity of all VMs >> connected to the same virtual switch. >> >> To avoid this, validation would be needed at some points in the >> rte_vhost_dequeue_burst() code: >> >> 1) when the queue descriptor is picked up for processing, >> desc->flags and desc->len might both be 0 >> >> ... >> desc = &vq->desc[head[entry_success]]; >> ... >> /* Discard first buffer as it is the virtio header */ >> if (desc->flags & VRING_DESC_F_NEXT) { >> desc = &vq->desc[desc->next]; >> vb_offset = 0; >> vb_avail = desc->len; >> } else { >> vb_offset = vq->vhost_hlen; >> vb_avail = desc->len - vb_offset; >> } >> .... >> >> 2) at buffer address translation gpa_to_vva(), might fail >> returning NULL as indication >> >> vb_addr = gpa_to_vva(dev, desc->addr); >> ... >> while (cpy_len != 0) { >> rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, seg_offset), >> (void *)((uintptr_t)(vb_addr + vb_offset)), >> cpy_len); >> ... >> } >> ... >> >> >> Wondering if there are any plans of adding any kind of validation in >> DPDK, or if it would be useful to suggest specific implementation of >> such validations in the DPDK code? >> >> Or is there some mechanism that gives us the confidence to trust >> the vhost queue content absolutely? >> >> >> >> Debugging data: >> >> For my scenario the problem occurs in DPDK rte_vhost_dequeue_burst() >> due to use of a vhost queue descriptor that has all fields 0: >> >> (gdb) print *desc >> {addr = 0, len = 0, flags = 0, next = 0} >> >> >> Subsequent use of desc->len to compute vb_avail = desc->len - vb_offset, >> leads to the problem observed. What happens is that the packet needs to >> be segmented -- on my system it fails roughly at segment 122000 when >> memory available for mbufs run out. >> >> The relevant local variables for rte_vhost_dequeue_burst() when breaking >> on the condition desc->len == 0: >> >> vb_avail = 4294967284 (0xfffffff4) >> seg_avail = 2608 >> vb_offset = 12 >> cpy_len = 2608 >> seg_num = 1 >> desc = 0x2aadb6e5c000 >> vb_addr = 46928960159744 >> entry_success = 0 >> >> Note also that there is no crash despite to the desc->addr being zero, >> it is a valid address in the regions mapped to the device. Although, the >> 3 regions mapped does not seem to be correct either at this stage. >> >> >> The versions that I'm running are OVS 2.4.0, with corrections from the >> 2.4 branch, and DPDK 2.1.0. QEMU emulator version 2.2.0 and >> libvirt version 1.2.12. >> >> >> Regards, >> >> Patrik > Thanks Patrik. You are right. We had planned to enhance the robustness > of vhost so that neither malicious nor buggy guest virtio driver could > corrupt vhost. Actually the 16.04 RC1 has fixed some issues (the return > of gpa_to_vva isn't checked). >