You are showing different cmd lines then last time :-) I’ll try to take a look as time permits
> On May 15, 2016, at 7:47 AM, Siegmar Gross > <siegmar.gr...@informatik.hs-fulda.de> wrote: > > Hi Jeff, > > today I upgraded to the latest version and I still have > problems. I compiled with gcc-6.1.0 and I tried to compile > with Sun C 5.14 beta. Sun C still broke with "unrecognized > option '-path'" which was reported before, so that I use > my gcc version. By the way, this problem is solved for > openmpi-v2.x-dev-1425-ga558e90 and openmpi-dev-4050-g7f65c2b. > > loki hello_2 124 ompi_info | grep -e "OPAL repo revision" -e "C compiler > absolute" > OPAL repo revision: v1.10.2-189-gfc05056 > C compiler absolute: /usr/local/gcc-6.1.0/bin/gcc > loki hello_2 125 mpiexec -np 1 --host loki hello_2_mpi : -np 1 --host loki > --slot-list 0:0-5,1:0-5 hello_2_slave_mpi > -------------------------------------------------------------------------- > There are not enough slots available in the system to satisfy the 1 slots > that were requested by the application: > hello_2_slave_mpi > > Either request fewer slots for your application, or make more slots available > for use. > -------------------------------------------------------------------------- > > > > I get a result, if I add "--slot-list" to the master process > as well. I changed "-np 2" to "-np 1" for the slave processes > to show new problems. > > loki hello_2 126 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 > hello_2_mpi : -np 1 --host loki --slot-list 0:0-5,1:0-5 hello_2_slave_mpi > Process 0 of 2 running on loki > Process 1 of 2 running on loki > > Now 1 slave tasks are sending greetings. > > Greetings from task 1: > message type: 3 > msg length: 132 characters > message: > hostname: loki > operating system: Linux > release: 3.12.55-52.42-default > processor: x86_64 > > > Now lets increase the number of slave processes to 2. > I still get only greetings from one slave process and > if I increase the number of slave processes to 3, I get > a segmentation fault. It's nearly the same for > openmpi-v2.x-dev-1425-ga558e90 (the only difference is > that the program hangs forever for 3 slave processes > for my cc and gcc version). Everything works as expected > for openmpi-dev-4050-g7f65c2b (although it takes very long > until I get all messages). It even works, if I put > "--slot-list" only once on the command line as you can see > below. > > loki hello_2 127 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 > hello_2_mpi : -np 2 --host loki --slot-list 0:0-5,1:0-5 hello_2_slave_mpi > Process 0 of 2 running on loki > Process 1 of 2 running on loki > > Now 1 slave tasks are sending greetings. > > Greetings from task 1: > message type: 3 > msg length: 132 characters > message: > hostname: loki > operating system: Linux > release: 3.12.55-52.42-default > processor: x86_64 > > > loki hello_2 128 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 > hello_2_mpi : -np 3 --host loki --slot-list 0:0-5,1:0-5 hello_2_slave_mpi > [loki:28536] *** Process received signal *** > [loki:28536] Signal: Segmentation fault (11) > [loki:28536] Signal code: Address not mapped (1) > [loki:28536] Failing at address: 0x8 > [loki:28536] [ 0] /lib64/libpthread.so.0(+0xf870)[0x7fd40eb75870] > [loki:28536] [ 1] > /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_proc_self+0x35)[0x7fd40edd85b0] > [loki:28536] [ 2] > /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_comm_init+0x68b)[0x7fd40edb7b08] > [loki:28536] [ 3] > /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_mpi_init+0xa90)[0x7fd40eddde8a] > [loki:28536] [ 4] > /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(MPI_Init+0x180)[0x7fd40ee1a28e] > [loki:28536] [ 5] hello_2_slave_mpi[0x400bee] > [loki:28536] [ 6] *** An error occurred in MPI_Init > *** on a NULL communicator > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > *** and potentially your MPI job) > [loki:28534] Local abort before MPI_INIT completed successfully; not able to > aggregate error messages, and not able to guarantee that all other processes > were killed! > *** An error occurred in MPI_Init > *** on a NULL communicator > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > *** and potentially your MPI job) > [loki:28535] Local abort before MPI_INIT completed successfully; not able to > aggregate error messages, and not able to guarantee that all other processes > were killed! > /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fd40e7dfb05] > [loki:28536] [ 7] hello_2_slave_mpi[0x400fb0] > [loki:28536] *** End of error message *** > ------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code.. Per user-direction, the job has been aborted. > ------------------------------------------------------- > -------------------------------------------------------------------------- > mpiexec detected that one or more processes exited with non-zero status, thus > causing > the job to be terminated. The first process to do so was: > > Process name: [[61640,1],0] > Exit code: 1 > -------------------------------------------------------------------------- > loki hello_2 129 > > > > loki hello_2 114 ompi_info | grep -e "OPAL repo revision" -e "C compiler > absolute" > OPAL repo revision: dev-4050-g7f65c2b > C compiler absolute: /opt/solstudio12.5b/bin/cc > loki hello_2 115 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 > hello_2_mpi : -np 3 --host loki --slot-list 0:0-5,1:0-5 hello_2_slave_mpi > Process 0 of 4 running on loki > Process 1 of 4 running on loki > Process 2 of 4 running on loki > Process 3 of 4 running on loki > ... > > > It even works, if I put "--slot-list" only once on the command > line. > > loki hello_2 116 mpiexec -np 1 --host loki hello_2_mpi : -np 3 --host loki > --slot-list 0:0-5,1:0-5 hello_2_slave_mpi > Process 1 of 4 running on loki > Process 2 of 4 running on loki > Process 0 of 4 running on loki > Process 3 of 4 running on loki > ... > > > Hopefully you know what happens and why it happens so that > you can fix the problem for openmpi-1.10.x and openmpi-2.x. > My three spawn programs work with openmpi-master as well > while "spawn_master" breaks on both openmpi-1.10.x and > openmpi-2.x with the same failure as my hello master/slave > program. > > Do you know when the Java problem will be solved? > > > Kind regards > > Siegmar > > > > Am 15.05.2016 um 01:27 schrieb Ralph Castain: >> >>> On May 7, 2016, at 1:13 AM, Siegmar Gross >>> <siegmar.gr...@informatik.hs-fulda.de> wrote: >>> >>> Hi, >>> >>> yesterday I installed openmpi-v1.10.2-176-g9d45e07 on my "SUSE Linux >>> Enterprise Server 12 (x86_64)" with Sun C 5.13 and gcc-5.3.0. The >>> following programs don't run anymore. >>> >>> >>> loki hello_2 112 ompi_info | grep -e "OPAL repo revision" -e "C compiler >>> absolute" >>> OPAL repo revision: v1.10.2-176-g9d45e07 >>> C compiler absolute: /opt/solstudio12.4/bin/cc >>> loki hello_2 113 mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host >>> loki,loki hello_2_slave_mpi >>> -------------------------------------------------------------------------- >>> There are not enough slots available in the system to satisfy the 2 slots >>> that were requested by the application: >>> hello_2_slave_mpi >>> >>> Either request fewer slots for your application, or make more slots >>> available >>> for use. >>> -------------------------------------------------------------------------- >>> loki hello_2 114 >>> >> >> The above worked fine for me with: >> >> OPAL repo revision: v1.10.2-182-g52c7573 >> >> You might try updating. >> >>> >>> >>> Everything worked as expected with openmpi-v1.10.0-178-gb80f802. >>> >>> loki hello_2 114 ompi_info | grep -e "OPAL repo revision" -e "C compiler >>> absolute" >>> OPAL repo revision: v1.10.0-178-gb80f802 >>> C compiler absolute: /opt/solstudio12.4/bin/cc >>> loki hello_2 115 mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host >>> loki,loki hello_2_slave_mpi >>> Process 0 of 3 running on loki >>> Process 1 of 3 running on loki >>> Process 2 of 3 running on loki >>> >>> Now 2 slave tasks are sending greetings. >>> >>> Greetings from task 2: >>> message type: 3 >>> ... >>> >>> >>> I have the same problem with openmpi-v2.x-dev-1404-g74d8ea0, if I use >>> the following commands. >>> >>> mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,loki >>> hello_2_slave_mpi >>> mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,nfs1 >>> hello_2_slave_mpi >>> mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki --slot-list >>> 0:0-5,1:0-5 hello_2_slave_mpi >>> >>> >>> I have also the same problem with openmpi-dev-4010-g6c9d65c, if I use >>> the following command. >>> >>> mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,loki >>> hello_2_slave_mpi >>> >>> >>> openmpi-dev-4010-g6c9d65c works as expected with the following commands. >>> >>> mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,nfs1 >>> hello_2_slave_mpi >>> mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki --slot-list >>> 0:0-5,1:0-5 hello_2_slave_mpi >>> >>> >>> Has the interface changed so that I'm not allowed to use some of my >>> commands any longer? I would be grateful, if somebody can fix the >>> problem if it is a problem. Thank you very much for any help in >>> advance. >>> >>> >>> >>> Kind regards >>> >>> Siegmar >>> <hello_2_mpi.c><hello_2_slave_mpi.c>_______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/05/29126.php >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/05/29205.php >> > <hello_2_mpi.c><hello_2_slave_mpi.c>_______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/05/29206.php