[OMPI users] Slot count parameter in hostfile ignored
Hello, I'm trying to tell OpenMPI how many processes per node I want to use, but mpirun seems to ignore the configuration I provide. I create following hostfile: $ cat hostfile.16 taurusi6344 slots=16 taurusi6348 slots=16 And then start the app as follows: $ mpirun --display-map -machinefile hostfile.16 -np 2 hostname Data for JOB [42099,1] offset 0 JOB MAP Data for node: taurusi6344 Num slots: 1Max slots: 0Num procs: 1 Process OMPI jobid: [42099,1] App: 0 Process rank: 0 Bound: socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]], socket 0[core 8[hwt 0]], socket 0[core 9[hwt 0]], socket 0[core 10[hwt 0]], socket 0[core 11[hwt 0]]:[B/B/B/B/B/B/B/B/B/B/B/B][./././././././././././.] Data for node: taurusi6348 Num slots: 1Max slots: 0Num procs: 1 Process OMPI jobid: [42099,1] App: 0 Process rank: 1 Bound: socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]], socket 0[core 8[hwt 0]], socket 0[core 9[hwt 0]], socket 0[core 10[hwt 0]], socket 0[core 11[hwt 0]]:[B/B/B/B/B/B/B/B/B/B/B/B][./././././././././././.] = taurusi6344 taurusi6348 If I put anything more than 2 in "-np 2", I get following error message: $ mpirun --display-map -machinefile hostfile.16 -np 4 hostname -- There are not enough slots available in the system to satisfy the 4 slots that were requested by the application: hostname Either request fewer slots for your application, or make more slots available for use. -- The OpenMPI version is "mpirun (Open MPI) 2.1.0" Also there is SLURM installed with version "slurm 16.05.7-Bull.1.1-20170512-1252" Could you help me to enforce OpenMPI to respect slots paremeter? -- Regards, Maksym Planeta smime.p7s Description: S/MIME Cryptographic Signature ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] Slot count parameter in hostfile ignored
My best guess is that SLURM has only allocated 2 slots, and we respect the RM regardless of what you say in the hostfile. You can check this by adding --display-allocation to your cmd line. You probably need to tell slurm to allocate more cpus/node. > On Sep 7, 2017, at 3:33 AM, Maksym Planeta > wrote: > > Hello, > > I'm trying to tell OpenMPI how many processes per node I want to use, but > mpirun seems to ignore the configuration I provide. > > I create following hostfile: > > $ cat hostfile.16 > taurusi6344 slots=16 > taurusi6348 slots=16 > > And then start the app as follows: > > $ mpirun --display-map -machinefile hostfile.16 -np 2 hostname > Data for JOB [42099,1] offset 0 > > JOB MAP > > Data for node: taurusi6344 Num slots: 1Max slots: 0Num procs: 1 >Process OMPI jobid: [42099,1] App: 0 Process rank: 0 Bound: socket > 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket > 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]], socket > 0[core 6[hwt 0]], socket 0[core 7[hwt 0]], socket 0[core 8[hwt 0]], socket > 0[core 9[hwt 0]], socket 0[core 10[hwt 0]], socket 0[core 11[hwt > 0]]:[B/B/B/B/B/B/B/B/B/B/B/B][./././././././././././.] > > Data for node: taurusi6348 Num slots: 1Max slots: 0Num procs: 1 >Process OMPI jobid: [42099,1] App: 0 Process rank: 1 Bound: socket > 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket > 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]], socket > 0[core 6[hwt 0]], socket 0[core 7[hwt 0]], socket 0[core 8[hwt 0]], socket > 0[core 9[hwt 0]], socket 0[core 10[hwt 0]], socket 0[core 11[hwt > 0]]:[B/B/B/B/B/B/B/B/B/B/B/B][./././././././././././.] > > = > taurusi6344 > taurusi6348 > > If I put anything more than 2 in "-np 2", I get following error message: > > $ mpirun --display-map -machinefile hostfile.16 -np 4 hostname > -- > There are not enough slots available in the system to satisfy the 4 slots > that were requested by the application: > hostname > > Either request fewer slots for your application, or make more slots available > for use. > -- > > The OpenMPI version is "mpirun (Open MPI) 2.1.0" > > Also there is SLURM installed with version "slurm > 16.05.7-Bull.1.1-20170512-1252" > > Could you help me to enforce OpenMPI to respect slots paremeter? > -- > Regards, > Maksym Planeta > > ___ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] Cygwin64 mpiexec freezes
Windows 10 64bit, Cygwin64, openmpi 1.10.7-1 (dev, c, c++, fortran), GCC 6.3.0-2 (core, gcc, g++, fortran) I am compiling the standard "hello_c.c" example with mgicc: $ mpicc -g hello_c.c -o hello_c The showme: gcc -g hello_c.c -o hello_c -fexceptions -L/usr/lib -lmpi -lopen-rte -lopen-pal -lm -lgdi32 This successfully creates hello_c.exe. When I run it directly, it performs as expected (The first time run brings up a Windows Firewall dialog and I click Accept): $ ./hello_c Hello World! I am 0 of 1, (Open MPI v1.10.7, package: Open MPI marco@GE-MATZERI-EU Distribution, ident: 1.10.7, repo rev: v1.10.6-48-g5e373bf, May 16, 2017, 129) However, when I run it using mpiexec: $ mpiexec -n 4 ./hello_c $ ^C Nothing is displayed and I have to ^C out. If I insert a puts("Start") just before the call to MPI_Init(&argc, &argv), and a puts("MPI_Init done.") just after, mpiexec will print "Start" for each process (4 times for the above example) and then freeze. It is never returning from the call to MPI_Init(...). This is a freshly installed Cygwin64 and other non-mpi programs work fine. Can anyone give me an idea of what is going on? hello_c.c --- #include #include "mpi.h" int main(int argc, char* argv[]) { int rank, size, len; char version[MPI_MAX_LIBRARY_VERSION_STRING]; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Get_library_version(version, &len); printf("Hello World! I am %d of %d, (%s, %d)\n", rank, size, version, len); MPI_Finalize(); return 0; } --- ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] Cygwin64 mpiexec freezes
On 07/09/2017 21:12, Llelan D. wrote: Windows 10 64bit, Cygwin64, openmpi 1.10.7-1 (dev, c, c++, fortran), GCC 6.3.0-2 (core, gcc, g++, fortran) I am compiling the standard "hello_c.c" example with *mgicc*: $ mpicc -g hello_c.c -o hello_c The showme: gcc -g hello_c.c -o hello_c -fexceptions -L/usr/lib -lmpi -lopen-rte -lopen-pal -lm -lgdi32 This successfully creates hello_c.exe. When I run it directly, it performs as expected (The first time run brings up a Windows Firewall dialog and I click Accept): $ ./hello_c Hello World! I am 0 of 1, (Open MPI v1.10.7, package: Open MPI marco@GE-MATZERI-EU Distribution, ident: 1.10.7, repo rev: v1.10.6-48-g5e373bf, May 16, 2017, 129) However, when I run it using mpiexec: $ mpiexec -n 4 ./hello_c $ ^C Nothing is displayed and I have to ^C out. If I insert a puts("Start") just before the call to MPI_Init(&argc, &argv), and a puts("MPI_Init done.") just after, mpiexec will print "Start" for each process (4 times for the above example) and then freeze. It is never returning from the call to MPI_Init(...). This is a freshly installed Cygwin64 and other non-mpi programs work fine. Can anyone give me an idea of what is going on? same here. I will investigate to check if is a side effect of the new 6.3.0-2 compiler or of the latest cygwin Regards Marco ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] Errors when compiled with Cygwin MinGW gcc
Windows 10 64bit, Cygwin64, openmpi 1.10.7-1 (dev, c, c++, fortran), x86_64-w64-mingw32-gcc 6.3.0-1 (core, gcc, g++, fortran) I am compiling the standard "hello_c.c" example with mgicc configured to use the Cygwin installed MinGW gcc compiler: $ export OMPI_CC=x86_64-w64-mingw32-gcc $ mpicc -idirafter /cygdrive/c/cygwin64/usr/include hello_c.c -o hello_c For some unknown reason, I have to manually include the "usr/include" directory to pick up the "mpi.h" header, and it must be searched after the standard header directories to avoid "time_t" typedef conflicts. The showme: x86_64-w64-mingw32-gcc -idirafter /cygdrive/c/cygwin64/usr/include hello_c.c -o hello_c -fexceptions -L/usr/lib -lmpi -lopen-rte -lopen-pal -lm -lgdi32 This successfully creates hello_c.exe. Running it either directly or with mpiexec displays the following errors for each process: $ ./hello_c 1 [main] hello_c 18116 child_copy: cygheap read copy failed, 0x180307408..0x180319318, done 0, windows pid 18116, Win32 error 6 112 [main] hello_c 18116 D:\mpi\examples\hello_c.exe: *** fatal error - ccalloc would have returned NULL $ mpiexec -n 4 hello_c 1 [main] hello_c 15660 child_copy: cygheap read copy failed, 0x180307408..0x1803216B0, done 0, windows pid 15660, Win32 error 6 182 [main] hello_c 15660 D:\mpi\examples\hello_c.exe: *** fatal error - ccalloc would have returned NULL 2 [main] hello_c 7852 child_copy: cygheap read copy failed, 0x180307408..0x18031F588, done 0, windows pid 7852, Win32 error 6 223 [main] hello_c 7852 D:\mpi\examples\hello_c.exe: *** fatal error - ccalloc would have returned NULL 1 [main] hello_c 16464 child_copy: cygheap read copy failed, 0x180307408..0x1803208E0, done 0, windows pid 16464, Win32 error 6 215 [main] hello_c 16464 D:\mpi\examples\hello_c.exe: *** fatal error - ccalloc would have returned NULL 2 [main] hello_c 17184 child_copy: cygheap read copy failed, 0x180307408..0x180322710, done 0, windows pid 17184, Win32 error 6 281 [main] hello_c 17184 D:\mpi\examples\hello_c.exe: *** fatal error - ccalloc would have returned NULL Does anyone have any ideas as to what is causing these errors. Can an open mpi application even be compiled with the Cygwin installed MinGW gcc compiler? hello_c.c --- #include #include "mpi.h" int main(int argc, char* argv[]) { int rank, size, len; char version[MPI_MAX_LIBRARY_VERSION_STRING]; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Get_library_version(version, &len); printf("Hello World! I am %d of %d, (%s, %d)\n", rank, size, version, len); MPI_Finalize(); return 0; } --- ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users