[OMPI users] "MCW rank 0 is not bound (or bound to all available processors)" when running multiple jobs concurrently

2024-04-15 Thread Greg Samonds via users
Hello, We're running into issues with jobs failing in a non-deterministic way when running multiple jobs concurrently within a "make test" framework. Make test is launched from within a shell script running inside a Podman container, and we're typically running with "-j 20" and "-np 4" (20 jobs

Re: [OMPI users] "MCW rank 0 is not bound (or bound to all available processors)" when running multiple jobs concurrently

2024-04-16 Thread Greg Samonds via users
your "make test"? Though the warning might be ignored, SIGILL is definitely an issue. I encourage you to have your app dump a core in order to figure out where this is coming from Cheers, Gilles On Tue, Apr 16, 2024 at 5:20 AM Greg Samonds via users mailto:users@lists.open-mpi.org>

Re: [OMPI users] "MCW rank 0 is not bound (or bound to all available processors)" when running multiple jobs concurrently

2024-04-17 Thread Greg Samonds via users
run time problems. 1. And also, please check hwloc and it is dependencies which usually are not present with default os installations and container images. Regards, Mehmet From: users mailto:users-boun...@lists.open-mpi.org>> on behalf of Greg Samonds via