CC'ed EAL hugepage maintainer, which is something you should do when send a patch.
On Fri, Feb 05, 2016 at 07:20:24PM +0800, Jianfeng Tan wrote: > Originally, there're two cons in using hugepage: a. needs root > privilege to touch /proc/self/pagemap, which is a premise to > alllocate physically contiguous memseg; b. possibly too many > hugepage file are created, especially used with 2M hugepage. > > For virtual devices, they don't care about physical-contiguity > of allocated hugepages at all. Option --single-file is to > provide a way to allocate all hugepages into single mem-backed > file. > > Known issue: > a. single-file option relys on kernel to allocate numa-affinitive > memory. > b. possible ABI break, originally, --no-huge uses anonymous memory > instead of file-backed way to create memory. > > Signed-off-by: Huawei Xie <huawei.xie at intel.com> > Signed-off-by: Jianfeng Tan <jianfeng.tan at intel.com> ... > @@ -956,6 +961,16 @@ eal_check_common_options(struct internal_config > *internal_cfg) > "be specified together with --"OPT_NO_HUGE"\n"); > return -1; > } > + if (internal_cfg->single_file && internal_cfg->force_sockets == 1) { > + RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE" cannot " > + "be specified together with --"OPT_SOCKET_MEM"\n"); > + return -1; > + } > + if (internal_cfg->single_file && internal_cfg->hugepage_unlink) { > + RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot " > + "be specified together with --"OPT_SINGLE_FILE"\n"); > + return -1; > + } The two limitation doesn't make sense to me. > diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c > b/lib/librte_eal/linuxapp/eal/eal_memory.c > index 6008533..68ef49a 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_memory.c > +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c > @@ -1102,20 +1102,54 @@ rte_eal_hugepage_init(void) > /* get pointer to global configuration */ > mcfg = rte_eal_get_configuration()->mem_config; > > - /* hugetlbfs can be disabled */ > - if (internal_config.no_hugetlbfs) { > - addr = mmap(NULL, internal_config.memory, PROT_READ | > PROT_WRITE, > - MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); > + /* when hugetlbfs is disabled or single-file option is specified */ > + if (internal_config.no_hugetlbfs || internal_config.single_file) { > + int fd; > + uint64_t pagesize; > + unsigned socket_id = rte_socket_id(); > + char filepath[MAX_HUGEPAGE_PATH]; > + > + if (internal_config.no_hugetlbfs) { > + eal_get_hugefile_path(filepath, sizeof(filepath), > + "/dev/shm", 0); > + pagesize = RTE_PGSIZE_4K; > + } else { > + struct hugepage_info *hpi; > + > + hpi = &internal_config.hugepage_info[0]; > + eal_get_hugefile_path(filepath, sizeof(filepath), > + hpi->hugedir, 0); > + pagesize = hpi->hugepage_sz; > + } > + fd = open(filepath, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR); > + if (fd < 0) { > + RTE_LOG(ERR, EAL, "%s: open %s failed: %s\n", > + __func__, filepath, strerror(errno)); > + return -1; > + } > + > + if (ftruncate(fd, internal_config.memory) < 0) { > + RTE_LOG(ERR, EAL, "ftuncate %s failed: %s\n", > + filepath, strerror(errno)); > + return -1; > + } > + > + addr = mmap(NULL, internal_config.memory, > + PROT_READ | PROT_WRITE, > + MAP_SHARED | MAP_POPULATE, fd, 0); > if (addr == MAP_FAILED) { > - RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__, > - strerror(errno)); > + RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", > + __func__, strerror(errno)); > return -1; > } > mcfg->memseg[0].phys_addr = (phys_addr_t)(uintptr_t)addr; > mcfg->memseg[0].addr = addr; > - mcfg->memseg[0].hugepage_sz = RTE_PGSIZE_4K; > + mcfg->memseg[0].hugepage_sz = pagesize; > mcfg->memseg[0].len = internal_config.memory; > - mcfg->memseg[0].socket_id = 0; > + mcfg->memseg[0].socket_id = socket_id; I saw quite few issues: - Assume I have a system with two hugepage sizes: 1G (x4) and 2M (x512), mounted at /dev/hugepages and /mnt, respectively. Here we then got an 5G internal_config.memory, and your code will try to mmap 5G on the first mount point (/dev/hugepages) due to the hardcode logic in your code: hpi = &internal_config.hugepage_info[0]; eal_get_hugefile_path(filepath, sizeof(filepath), hpi->hugedir, 0); But it has 4G in total, therefore, it will fails. - As you stated, socket_id is hardcoded, which could be wrong. - As stated in above, the option limitation doesn't seem right to me. I mean, --single-file should be able to work with --socket-mem option in semantic. And I have been thinking how to deal with those issues properly, and a __very immature__ solution come to my mind (which could be simply not working), but anyway, here is FYI: we go through the same process to handle normal huge page initilization to --single-file option as well. But we take different actions or no actions at all at some stages when that option is given, which is a bit similiar with the way of handling RTE_EAL_SINGLE_FILE_SEGMENTS. And we create one hugepage file for each node, each page size. For a system like mine above (2 nodes), it may populate files like following: - 1G x 2 on node0 - 1G x 2 on node1 - 2M x 256 on node0 - 2M x 256 on node1 That could normally fit your case. Though 4 nodes looks like the maximum node number, --socket-mem option may relieve the limit a bit. And if we "could" not care the socket_id being set correctly, we could just simply allocate one file for each hugepage size. That would work well for your container enabling. BTW, since we already have SINGLE_FILE_SEGMENTS (config) option, adding another option --single-file looks really confusing to me. To me, maybe you could base the SINGLE_FILE_SEGMENTS option, and add another option, say --no-sort (I confess this name sucks, but you get my point). With that, we could make sure to create as least huge page files as possible, to fit your case. --yliu