[OMPI users] How to use hugetlbfs with openmpi and ucx

2023-07-19 Thread Chandran, Arun via users
Hi All, I am trying to see whether hugetlbfs is improving the latency of communication with a small send receive program. mpirun -np 2 --map-by core --bind-to core --mca pml ucx --mca opal_common_ucx_tls any --mca opal_common_ucx_devices any -mca pml_base_verbose 10 --mca mtl_base_verbose 10

[OMPI users] libnuma.so error

2023-07-19 Thread Luis Cebamanos via users
Hello, I was wondering if anyone has ever seen the following runtime error: mpirun -np 32 ./hello . [LOG_CAT_SBGP] libnuma.so: cannot open shared object file: No such file or directory [LOG_CAT_SBGP] Failed to dlopen libnuma.so. Fallback to GROUP_BY_SOCKET manual. . The funny thing i

Re: [OMPI users] libnuma.so error

2023-07-19 Thread Gilles Gouaillardet via users
Luis, That can happen if a component is linked with libnuma.so: Open MPI will fail to open it and try to fallback on an other one. You can run ldd on the mca_*.so components in the /.../lib/openmpi directory to figure out which is using libnuma.so and assess if it is needed or not. Cheers, Gil

Re: [OMPI users] libnuma.so error

2023-07-19 Thread Jeff Squyres (jsquyres) via users
It's not clear if that message is being emitted by Open MPI. It does say it's falling back to a different behavior if libnuma.so is not found, so it appears if it's treating it as a warning, not an error. From: users on behalf of Luis Cebamanos via users Sent:

Re: [OMPI users] [EXT] Re: Error handling

2023-07-19 Thread George Bosilca via users
Alex, exit(status) does not make status available to the parent process wait, instead it makes the low 8 bits available to the parent as unsigned. This explains why small positive values seem to work correctly while negative values do not (because of the 32 bits negative value representation in co

Re: [OMPI users] [EXT] Re: Error handling

2023-07-19 Thread Jeff Squyres (jsquyres) via users
MPI_Allreduce should work just fine, even with negative numbers. If you are seeing something different, can you provide a small reproducer program that shows the problem? We can dig deeper into if if we can reproduce the problem. mpirun's exit status can't distinguish between MPI processes who

Re: [OMPI users] [EXT] Re: Error handling

2023-07-19 Thread George Bosilca via users
I think the root cause was that he expected the negative integer resulting from the reduction to be the exit code of the application, and as I explained in my prior email that's not how exit() works. The exit() issue aside, MPI_Abort seems to be the right function for this usage. George. On W

Re: [OMPI users] libnuma.so error

2023-07-19 Thread Gus Correa via users
If it is installed, libunuma should be in: /usr/lib64/libnuma.so as a softlink to the actual number-versioned library. In general the loader is configured to search for shared libraries in /usr/lib64 ("ldd " may shed some light here). You can check if the numa packages are installed with: yum lis

Re: [OMPI users] How to use hugetlbfs with openmpi and ucx

2023-07-19 Thread Chandran, Arun via users
Hi, I am trying to use static huge pages, not transparent huge pages. Ucx is allowed to allocate via hugetlbfs. $ ./bin/ucx_info -c | grep -i huge UCX_SELF_ALLOC=huge,thp,md,mmap,heap UCX_TCP_ALLOC=huge,thp,md,mmap,heap UCX_SYSV_HUGETLB_MODE=try --->It is trying this and failing UCX_SYSV_FIFO_HU

Re: [OMPI users] [EXT] Re: [EXT] Re: Error handling

2023-07-19 Thread Alexander Stadik via users
Hey George, thanks of course, this fully explains it, I simply assumed it being a problem of the child process. In this case there is also no issue with negative values when considering the modulo 256. BR Alex From: George Bosilca Sent: Wednesday, July 19, 2023 4:45 PM To: Alexander Stadik C

Re: [OMPI users] [EXT] Re: [EXT] Re: Error handling

2023-07-19 Thread Alexander Stadik via users
Hey Jeff, George Bosilca already cleared it up in a previous answer, I tested everything again, by simply considering the modulo 256 everything behaves as expected. BR Alex From: Jeff Squyres (jsquyres) Sent: Wednesday, July 19, 2023 5:09 PM To: George Bosilca ; Open MPI Users Cc: Alexander