On Thu, 23 Aug 2018 at 15:01, Burakov, Anatoly <anatoly.bura...@intel.com> wrote: > > On 23-Aug-18 12:19 PM, Sean Harte wrote: > > On Thu, 23 Aug 2018 at 10:05, Burakov, Anatoly > > <anatoly.bura...@intel.com> wrote: > >> > >> On 23-Aug-18 3:57 AM, Tiwei Bie wrote: > >>> Deadlock can occur when allocating memory if a vhost-kernel > >>> based virtio-user device is in use. Besides, it's possible > >>> to have much more than 64 non-contiguous hugepage backed > >>> memory regions due to the memory hotplug, which may cause > >>> problems when handling VHOST_SET_MEM_TABLE request. A better > >>> solution is to have the virtio-user pass all the VA ranges > >>> reserved by DPDK to vhost-kernel. > >>> > >>> Bugzilla ID: 81 > >>> Fixes: 12ecb2f63b12 ("net/virtio-user: support memory hotplug") > >>> Cc: sta...@dpdk.org > >>> > >>> Reported-by: Seán Harte <sea...@gmail.com> > >>> Signed-off-by: Tiwei Bie <tiwei....@intel.com> > >>> --- > >>> drivers/net/virtio/virtio_user/vhost_kernel.c | 64 ++++++++----------- > >>> 1 file changed, 27 insertions(+), 37 deletions(-) > >>> > >>> diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c > >>> b/drivers/net/virtio/virtio_user/vhost_kernel.c > >>> index b2444096c..49bd1b821 100644 > >>> --- a/drivers/net/virtio/virtio_user/vhost_kernel.c > >>> +++ b/drivers/net/virtio/virtio_user/vhost_kernel.c > >>> @@ -70,41 +70,12 @@ static uint64_t vhost_req_user_to_kernel[] = { > >>> [VHOST_USER_SET_MEM_TABLE] = VHOST_SET_MEM_TABLE, > >>> }; > >>> > >>> -struct walk_arg { > >>> - struct vhost_memory_kernel *vm; > >>> - uint32_t region_nr; > >>> -}; > >>> -static int > >>> -add_memory_region(const struct rte_memseg_list *msl __rte_unused, > >>> - const struct rte_memseg *ms, size_t len, void *arg) > >>> -{ > >>> - struct walk_arg *wa = arg; > >>> - struct vhost_memory_region *mr; > >>> - void *start_addr; > >>> - > >>> - if (wa->region_nr >= max_regions) > >>> - return -1; > >>> - > >>> - mr = &wa->vm->regions[wa->region_nr++]; > >>> - start_addr = ms->addr; > >>> - > >>> - mr->guest_phys_addr = (uint64_t)(uintptr_t)start_addr; > >>> - mr->userspace_addr = (uint64_t)(uintptr_t)start_addr; > >>> - mr->memory_size = len; > >>> - mr->mmap_offset = 0; > >>> - > >>> - return 0; > >>> -} > >>> - > >>> -/* By default, vhost kernel module allows 64 regions, but DPDK allows > >>> - * 256 segments. As a relief, below function merges those virtually > >>> - * adjacent memsegs into one region. > >>> - */ > >>> static struct vhost_memory_kernel * > >>> prepare_vhost_memory_kernel(void) > >>> { > >>> + struct rte_mem_config *mcfg = > >>> rte_eal_get_configuration()->mem_config; > >>> struct vhost_memory_kernel *vm; > >>> - struct walk_arg wa; > >>> + uint32_t region_nr = 0, i; > >>> > >>> vm = malloc(sizeof(struct vhost_memory_kernel) + > >>> max_regions * > >>> @@ -112,15 +83,34 @@ prepare_vhost_memory_kernel(void) > >>> if (!vm) > >>> return NULL; > >>> > >>> - wa.region_nr = 0; > >>> - wa.vm = vm; > >>> + for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) { > >>> + struct rte_memseg_list *msl = &mcfg->memsegs[i]; > >>> + struct vhost_memory_region *mr; > >>> + void *start_addr; > >>> + uint64_t len; > >> > >> There is a rte_memseg_list_walk() - please do not iterate over memseg > >> lists manually. > >> > > > > rte_memseg_list_walk() can't be used here because > > prepare_vhost_memory_kernel() is sometimes called from a memory > > callback. It will then hang trying to get a read lock on > > memory_hotplug_lock. > > OK, so use rte_memseg_list_walk_thread_unsafe(). > > > I don't think the rte_memseg_list_walk_thread_unsafe() function is > > appropriate because prepare_vhost_memory_kernel() may not always be > > called from a memory callback. > > And how is this different? What you're doing here is identical to > calling rte_memseg_list_walk_thread_unsafe() (that's precisely what it > does internally - check the code!), except that you're doing it manually > and not using DPDK API, which makes your code dependent on internals of > DPDK's memory implementation. > > So, this function may or may not be called from a callback, but you're > using thread-unsafe walk anyway. I think you should call either > thread-safe or thread-unsafe version depending on whether you're being > called from a callback or not. >
I'm not the patch author :-) Switching to the _thread_unsafe() function might convert the easy-to-debug deadlock issue into a tricky-to-debug race condition. Picking between the two implementations depending on the calling context would work. Another approach might be a different type of locking system that allows reading when the same thread already holds the write lock. > > > >>> > >>> - if (rte_memseg_contig_walk(add_memory_region, &wa) < 0) { > >>> - free(vm); > >>> - return NULL; > >>> + start_addr = msl->base_va; > >>> + len = msl->page_sz * msl->memseg_arr.len; > >>> + > >>> + if (start_addr == NULL || len == 0) > >>> + continue; > >>> + > >>> + if (region_nr >= max_regions) { > >>> + free(vm); > >>> + return NULL; > >>> + } > >>> + > >>> + mr = &vm->regions[region_nr++]; > >>> + mr->guest_phys_addr = (uint64_t)(uintptr_t)start_addr; > >>> + mr->userspace_addr = (uint64_t)(uintptr_t)start_addr; > >>> + mr->memory_size = len; > >>> + mr->mmap_offset = 0; /* flags_padding */ > >>> + > >>> + PMD_DRV_LOG(DEBUG, "index=%u, addr=%p len=%" PRIu64, > >>> + i, start_addr, len); > >>> } > >>> > >>> - vm->nregions = wa.region_nr; > >>> + vm->nregions = region_nr; > >>> vm->padding = 0; > >>> return vm; > >>> } > >>> > >> > >> > >> -- > >> Thanks, > >> Anatoly > > > > > -- > Thanks, > Anatoly