Hi All,
I am trying to see whether hugetlbfs is improving the latency of communication
with a small send receive program.
mpirun -np 2 --map-by core --bind-to core --mca pml ucx --mca
opal_common_ucx_tls any --mca opal_common_ucx_devices any -mca pml_base_verbose
10 --mca mtl_base_verbose 10
Hello,
I was wondering if anyone has ever seen the following runtime error:
mpirun -np 32 ./hello
.
[LOG_CAT_SBGP] libnuma.so: cannot open shared object file: No such file
or directory
[LOG_CAT_SBGP] Failed to dlopen libnuma.so. Fallback to GROUP_BY_SOCKET
manual.
.
The funny thing i
Luis,
That can happen if a component is linked with libnuma.so:
Open MPI will fail to open it and try to fallback on an other one.
You can run ldd on the mca_*.so components in the /.../lib/openmpi directory
to figure out which is using libnuma.so and assess if it is needed or not.
Cheers,
Gil
It's not clear if that message is being emitted by Open MPI.
It does say it's falling back to a different behavior if libnuma.so is not
found, so it appears if it's treating it as a warning, not an error.
From: users on behalf of Luis Cebamanos via
users
Sent:
Alex,
exit(status) does not make status available to the parent process wait,
instead it makes the low 8 bits available to the parent as unsigned. This
explains why small positive values seem to work correctly while negative
values do not (because of the 32 bits negative value representation in
co
MPI_Allreduce should work just fine, even with negative numbers. If you are
seeing something different, can you provide a small reproducer program that
shows the problem? We can dig deeper into if if we can reproduce the problem.
mpirun's exit status can't distinguish between MPI processes who
I think the root cause was that he expected the negative integer resulting
from the reduction to be the exit code of the application, and as I
explained in my prior email that's not how exit() works.
The exit() issue aside, MPI_Abort seems to be the right function for this
usage.
George.
On W
If it is installed, libunuma should be in:
/usr/lib64/libnuma.so
as a softlink to the actual number-versioned library.
In general the loader is configured to search for shared libraries
in /usr/lib64 ("ldd " may shed some light here).
You can check if the numa packages are installed with:
yum lis
Hi,
I am trying to use static huge pages, not transparent huge pages.
Ucx is allowed to allocate via hugetlbfs.
$ ./bin/ucx_info -c | grep -i huge
UCX_SELF_ALLOC=huge,thp,md,mmap,heap
UCX_TCP_ALLOC=huge,thp,md,mmap,heap
UCX_SYSV_HUGETLB_MODE=try --->It is trying this and failing
UCX_SYSV_FIFO_HU
Hey George,
thanks of course, this fully explains it, I simply assumed it being a problem
of the child process.
In this case there is also no issue with negative values when considering the
modulo 256.
BR Alex
From: George Bosilca
Sent: Wednesday, July 19, 2023 4:45 PM
To: Alexander Stadik
C
Hey Jeff,
George Bosilca already cleared it up in a previous answer, I tested everything
again, by simply considering the modulo 256 everything behaves as expected.
BR Alex
From: Jeff Squyres (jsquyres)
Sent: Wednesday, July 19, 2023 5:09 PM
To: George Bosilca ; Open MPI Users
Cc: Alexander
11 matches
Mail list logo