> -----Original Message----- > From: Ananyev, Konstantin > Sent: Saturday, November 7, 2015 12:22 AM > To: Tan, Jianfeng; dev at dpdk.org > Cc: nakajima.yoshihiro at lab.ntt.co.jp; zhbzg at huawei.com; mst at > redhat.com; > gaoxiaoqiu at huawei.com; oscar.zhangbo at huawei.com; > ann.zhuangyanying at huawei.com; zhoujingbin at huawei.com; > guohongzhen at huawei.com > Subject: RE: [dpdk-dev] [RFC 4/5] virtio/container: adjust memory > initialization process > > Hi, > > > -----Original Message----- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jianfeng Tan > > Sent: Thursday, November 05, 2015 6:31 PM > > To: dev at dpdk.org > > Cc: nakajima.yoshihiro at lab.ntt.co.jp; zhbzg at huawei.com; > > mst at redhat.com; gaoxiaoqiu at huawei.com; oscar.zhangbo at huawei.com; > > ann.zhuangyanying at huawei.com; zhoujingbin at huawei.com; > > guohongzhen at huawei.com > > Subject: [dpdk-dev] [RFC 4/5] virtio/container: adjust memory > > initialization process > > > > When using virtio for container, we should specify --no-huge so that > > in memory initialization, shm_open() is used to alloc memory from > > tmpfs filesystem /dev/shm/. > > > > Signed-off-by: Huawei Xie <huawei.xie at intel.com> > > Signed-off-by: Jianfeng Tan <jianfeng.tan at intel.com> > > --- ...... > > +int > > +rte_memseg_info_get(int index, int *pfd, uint64_t *psize, void > > +**paddr) { > > + struct rte_mem_config *mcfg; > > + mcfg = rte_eal_get_configuration()->mem_config; > > + > > + *pfd = mcfg->memseg[index].fd; > > + *psize = (uint64_t)mcfg->memseg[index].len; > > + *paddr = (void *)(uint64_t)mcfg->memseg[index].addr; > > + return 0; > > +} > > Wonder who will use that function? > Can't see any references to that function in that patch or next.
This function is used in 1/5, when virtio front end needs to send VHOST_USER_SET_MEM_TABLE to back end. > > + > > /* > > * Get physical address of any mapped virtual address in the current > process. > > */ > > @@ -1044,6 +1059,42 @@ calc_num_pages_per_socket(uint64_t * > memory, > > return total_num_pages; > > } > > > > +static void * > > +rte_eal_shm_create(int *pfd) > > +{ > > + int ret, fd; > > + char filepath[256]; > > + void *vaddr; > > + uint64_t size = internal_config.memory; > > + > > + sprintf(filepath, "/%s_cvio", internal_config.hugefile_prefix); > > + > > + fd = shm_open(filepath, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR); > > + if (fd < 0) { > > + rte_panic("shm_open %s failed: %s\n", filepath, > strerror(errno)); > > + } > > + ret = flock(fd, LOCK_EX); > > + if (ret < 0) { > > + close(fd); > > + rte_panic("flock %s failed: %s\n", filepath, strerror(errno)); > > + } > > + > > + ret = ftruncate(fd, size); > > + if (ret < 0) { > > + rte_panic("ftruncate failed: %s\n", strerror(errno)); > > + } > > + /* flag: MAP_HUGETLB */ > > Any explanation what that comment means here? > Do you plan to use MAP_HUGETLb in the call below or ...? Yes, it's a todo item. Shm_open() just uses a tmpfs mounted at /dev/shm. So I wonder maybe we can use this flag to make sure os allocates hugepages here if user would like to use hugepages. >> ...... > > @@ -1081,8 +1134,8 @@ rte_eal_hugepage_init(void) > > > > /* hugetlbfs can be disabled */ > > if (internal_config.no_hugetlbfs) { > > - addr = mmap(NULL, internal_config.memory, PROT_READ | > PROT_WRITE, > > - MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); > > + int fd; > > + addr = rte_eal_shm_create(&fd); > > Why do you remove ability to map(dev/zero) here? > Probably not everyone plan to use --no-hugepages only inside containers. >From my understanding, mmap here is just to allocate some memory, which is >initialized to be all zero. I cannot understand what's the relationship with /dev/zero. rte_eal_shm_create() can do the original function, plus it generates a fd to point to this chunk of memory. This fd is indispensable in vhost protocol when VHOST_USER_SET_MEM_TABLE using sendmsg(). > > > > if (addr == MAP_FAILED) { > > RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", > __func__, > > strerror(errno)); > > @@ -1093,6 +1146,7 @@ rte_eal_hugepage_init(void) > > mcfg->memseg[0].hugepage_sz = RTE_PGSIZE_4K; > > mcfg->memseg[0].len = internal_config.memory; > > mcfg->memseg[0].socket_id = 0; > > + mcfg->memseg[0].fd = fd; > > return 0; > > } > > > > diff --git a/lib/librte_mempool/rte_mempool.c > > b/lib/librte_mempool/rte_mempool.c > > index e57cbbd..8f8852b 100644 > > --- a/lib/librte_mempool/rte_mempool.c > > +++ b/lib/librte_mempool/rte_mempool.c > > @@ -453,13 +453,6 @@ rte_mempool_xmem_create(const char *name, > unsigned n, unsigned elt_size, > > rte_errno = EINVAL; > > return NULL; > > } > > - > > - /* check that we have both VA and PA */ > > - if (vaddr != NULL && paddr == NULL) { > > - rte_errno = EINVAL; > > - return NULL; > > - } > > - > > /* Check that pg_num and pg_shift parameters are valid. */ > > if (pg_num < RTE_DIM(mp->elt_pa) || pg_shift > > MEMPOOL_PG_SHIFT_MAX) { > > rte_errno = EINVAL; > > @@ -596,8 +589,15 @@ rte_mempool_xmem_create(const char *name, > > unsigned n, unsigned elt_size, > > > > /* mempool elements in a separate chunk of memory. */ > > } else { > > + /* when VA is specified, PA should be specified? */ > > + if (rte_eal_has_hugepages()) { > > + if (paddr == NULL) { > > + rte_errno = EINVAL; > > + return NULL; > > + } > > + memcpy(mp->elt_pa, paddr, sizeof (mp->elt_pa[0]) > * pg_num); > > + } > > mp->elt_va_start = (uintptr_t)vaddr; > > - memcpy(mp->elt_pa, paddr, sizeof (mp->elt_pa[0]) * > pg_num); > > Could you explain the reason for that change? > Specially why mempool over external memory now only allowed for > hugepages config? > Konstantin Oops, you're right! This change was previously for creating a mbuf mempool at a given vaddr and without giving any paddr[]. And now we don't need to care about neither vaddr nor paddr[] so I should have reverted change in this file. > > > } > > > > mp->elt_va_end = mp->elt_va_start; > > -- > > 2.1.4