Hi Nicolas,
Thanks for the quick response.
On 05/08/2020 14:42, Nicolas Dichtel wrote:
Le 05/08/2020 à 14:26, Nick Connolly a écrit :
Running dpdk-helloworld on Linux with lib numa present,
but no kernel support for NUMA (CONFIG_NUMA=n) causes
ret_service_init() to fail with EAL: error allocating
rte services array.
alloc_seg() calls get_mempolicy to verify that the allocation
has happened on the correct socket, but receives ENOSYS from
the kernel and fails the allocation.
The allocated socket should only be verified if check_numa() is true.
Fixes: 2a96c88be83e ("mem: ease init in a docker container")
I'm wondering if the bug existed before this commit.
Before this commit, it was:
move_pages(getpid(), 1, &addr, NULL, &cur_socket_id, 0);
if (cur_socket_id != socket_id) {
/* error */
Isn't it possible to hit this error case if CONFIG_NUMA is unset in the kernel?
I've just run the previous code to test this out and you are right that
move_pages does indeed return -1 with errno set to ENOSYS, but nothing
checks this so execution carries on and compares cur_socket_id (which
will be unchanged from the zero initialization) with socket_id (which is
presumably also zero), thus allowing the allocation to succeed!
[snip]
+ if (check_numa()) {
+ ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
+ MPOL_F_NODE | MPOL_F_ADDR);
+ if (ret < 0) {
+ RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
+ __func__, strerror(errno));
+ goto mapped;
+ } else if (cur_socket_id != socket_id) {
+ RTE_LOG(DEBUG, EAL,
+ "%s(): allocation happened on wrong socket
(wanted %d, got %d)\n",
+ __func__, socket_id, cur_socket_id);
+ goto mapped;
+ }
+ } else {
+ if (rte_socket_count() > 1)
+ RTE_LOG(DEBUG, EAL, "%s(): not checking socket for
allocation (wanted %d)\n",
+ __func__, socket_id);
nit: maybe an higher log level like WARNING?
Open to guidance here - my concern was that this is going to be
generated for every call to alloc_seg() and I'm not sure what the
frequency will be - I'm cautious about flooding the log with warnings
under 'normal running'. Are the implications of running on a multi
socket system with NUMA support disabled in the kernel purely
performance related for the DPDK or is there a functional correctness
issue as well?
Regards,
Nicolas
Regards,
Nick