https://bugs.dpdk.org/show_bug.cgi?id=561
Bug ID: 561 Summary: EAL: secondary dpdk application fails to come up in 5.4.35 kernel due to memzone_init failure Product: DPDK Version: 18.11 Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: critical Priority: Normal Component: core Assignee: dev@dpdk.org Reporter: amoha...@rbbn.com Target Milestone: --- Created attachment 126 --> https://bugs.dpdk.org/attachment.cgi?id=126&action=edit EAL: secondary dpdk application fails to come up in 5.4.35 kernel due to memzone_init failure DPDK version: 18.11.6 Kernel version: 5.4.35 Secondary DPDK process fails to come up in 5.4.35 kernel, throwing below error messages EAL: Probing VFIO support... EAL: VFIO support initialized EAL: Cannot attach to memzone list EAL: Cannot init memzone EAL: Error - exiting with code: 1 >From the code analysis, I could notice that in 5.4.35 kernel, the behavior of the rte_eal_init() in the context of primary process is different than that of 4.19 or 4.15 kernel. In 5.4.35 kernel, the primary process cleans up the dpdk run time directory in eal_clean_runtime_dir() function, as a result the file /var/run/dpdk/rte/fbarray_memzone gets deleted. so the secondary process while coming up fails to attach to memzone and exits. Basically the flock system call in eal_clean_runtime_dir() function succeeds in primary process, due to which the file gets deleted. But in 4.19/4.15 kernel, the flock system call always fails in primary process and secondary process is able to attach to the already created memzone. int eal_clean_runtime_dir(void) { DIR *dir; struct dirent *dirent; int dir_fd, fd, lck_result; static const char * const filters[] = { "fbarray_*", "mp_socket_*" }; /* open directory */ dir = opendir(runtime_dir); if (!dir) { RTE_LOG(ERR, EAL, "Unable to open runtime directory %s\n", runtime_dir); goto error; } dir_fd = dirfd(dir); /* lock the directory before doing anything, to avoid races */ if (flock(dir_fd, LOCK_EX) < 0) { RTE_LOG(ERR, EAL, "Unable to lock runtime directory %s\n", runtime_dir); goto error; } dirent = readdir(dir); if (!dirent) { RTE_LOG(ERR, EAL, "Unable to read runtime directory %s\n", runtime_dir); goto error; } while (dirent != NULL) { unsigned int f_idx; bool skip = true; /* skip files that don't match the patterns */ for (f_idx = 0; f_idx < RTE_DIM(filters); f_idx++) { const char *filter = filters[f_idx]; if (fnmatch(filter, dirent->d_name, 0) == 0) { skip = false; break; } } if (skip) { dirent = readdir(dir); continue; } /* try and lock the file */ fd = openat(dir_fd, dirent->d_name, O_RDONLY); /* skip to next file */ if (fd == -1) { dirent = readdir(dir); continue; } /* non-blocking lock */ lck_result = flock(fd, LOCK_EX | LOCK_NB); /* if lock succeeds, remove the file */ if (lck_result != -1) unlinkat(dir_fd, dirent->d_name, 0); close(fd); dirent = readdir(dir); } /* closedir closes dir_fd and drops the lock */ closedir(dir); return 0; error: if (dir) closedir(dir); RTE_LOG(ERR, EAL, "Error while clearing runtime dir: %s\n", strerror(errno)); return -1; } -- You are receiving this mail because: You are the assignee for the bug.