You are showing different cmd lines then last time :-)

I’ll try to take a look as time permits

> On May 15, 2016, at 7:47 AM, Siegmar Gross 
> <siegmar.gr...@informatik.hs-fulda.de> wrote:
> 
> Hi Jeff,
> 
> today I upgraded to the latest version and I still have
> problems. I compiled with gcc-6.1.0 and I tried to compile
> with Sun C 5.14 beta. Sun C still broke with "unrecognized
> option '-path'" which was reported before, so that I use
> my gcc version. By the way, this problem is solved for
> openmpi-v2.x-dev-1425-ga558e90 and openmpi-dev-4050-g7f65c2b.
> 
> loki hello_2 124 ompi_info | grep -e "OPAL repo revision" -e "C compiler 
> absolute"
>      OPAL repo revision: v1.10.2-189-gfc05056
>     C compiler absolute: /usr/local/gcc-6.1.0/bin/gcc
> loki hello_2 125 mpiexec -np 1 --host loki hello_2_mpi : -np 1 --host loki 
> --slot-list 0:0-5,1:0-5 hello_2_slave_mpi
> --------------------------------------------------------------------------
> There are not enough slots available in the system to satisfy the 1 slots
> that were requested by the application:
>  hello_2_slave_mpi
> 
> Either request fewer slots for your application, or make more slots available
> for use.
> --------------------------------------------------------------------------
> 
> 
> 
> I get a result, if I add "--slot-list" to the master process
> as well. I changed "-np 2" to "-np 1" for the slave processes
> to show new problems.
> 
> loki hello_2 126 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 
> hello_2_mpi : -np 1 --host loki --slot-list 0:0-5,1:0-5 hello_2_slave_mpi
> Process 0 of 2 running on loki
> Process 1 of 2 running on loki
> 
> Now 1 slave tasks are sending greetings.
> 
> Greetings from task 1:
>  message type:        3
>  msg length:          132 characters
>  message:
>    hostname:          loki
>    operating system:  Linux
>    release:           3.12.55-52.42-default
>    processor:         x86_64
> 
> 
> Now lets increase the number of slave processes to 2.
> I still get only greetings from one slave process and
> if I increase the number of slave processes to 3, I get
> a segmentation fault. It's nearly the same for
> openmpi-v2.x-dev-1425-ga558e90 (the only difference is
> that the program hangs forever for 3 slave processes
> for my cc and gcc version). Everything works as expected
> for openmpi-dev-4050-g7f65c2b (although it takes very long
> until I get all messages). It even works, if I put
> "--slot-list" only once on the command line as you can see
> below.
> 
> loki hello_2 127 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 
> hello_2_mpi : -np 2 --host loki --slot-list 0:0-5,1:0-5 hello_2_slave_mpi
> Process 0 of 2 running on loki
> Process 1 of 2 running on loki
> 
> Now 1 slave tasks are sending greetings.
> 
> Greetings from task 1:
>  message type:        3
>  msg length:          132 characters
>  message:
>    hostname:          loki
>    operating system:  Linux
>    release:           3.12.55-52.42-default
>    processor:         x86_64
> 
> 
> loki hello_2 128 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 
> hello_2_mpi : -np 3 --host loki --slot-list 0:0-5,1:0-5 hello_2_slave_mpi
> [loki:28536] *** Process received signal ***
> [loki:28536] Signal: Segmentation fault (11)
> [loki:28536] Signal code: Address not mapped (1)
> [loki:28536] Failing at address: 0x8
> [loki:28536] [ 0] /lib64/libpthread.so.0(+0xf870)[0x7fd40eb75870]
> [loki:28536] [ 1] 
> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_proc_self+0x35)[0x7fd40edd85b0]
> [loki:28536] [ 2] 
> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_comm_init+0x68b)[0x7fd40edb7b08]
> [loki:28536] [ 3] 
> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_mpi_init+0xa90)[0x7fd40eddde8a]
> [loki:28536] [ 4] 
> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(MPI_Init+0x180)[0x7fd40ee1a28e]
> [loki:28536] [ 5] hello_2_slave_mpi[0x400bee]
> [loki:28536] [ 6] *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***    and potentially your MPI job)
> [loki:28534] Local abort before MPI_INIT completed successfully; not able to 
> aggregate error messages, and not able to guarantee that all other processes 
> were killed!
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***    and potentially your MPI job)
> [loki:28535] Local abort before MPI_INIT completed successfully; not able to 
> aggregate error messages, and not able to guarantee that all other processes 
> were killed!
> /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fd40e7dfb05]
> [loki:28536] [ 7] hello_2_slave_mpi[0x400fb0]
> [loki:28536] *** End of error message ***
> -------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code.. Per user-direction, the job has been aborted.
> -------------------------------------------------------
> --------------------------------------------------------------------------
> mpiexec detected that one or more processes exited with non-zero status, thus 
> causing
> the job to be terminated. The first process to do so was:
> 
>  Process name: [[61640,1],0]
>  Exit code:    1
> --------------------------------------------------------------------------
> loki hello_2 129
> 
> 
> 
> loki hello_2 114 ompi_info | grep -e "OPAL repo revision" -e "C compiler 
> absolute"
>      OPAL repo revision: dev-4050-g7f65c2b
>     C compiler absolute: /opt/solstudio12.5b/bin/cc
> loki hello_2 115 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 
> hello_2_mpi : -np 3 --host loki --slot-list 0:0-5,1:0-5 hello_2_slave_mpi
> Process 0 of 4 running on loki
> Process 1 of 4 running on loki
> Process 2 of 4 running on loki
> Process 3 of 4 running on loki
> ...
> 
> 
> It even works, if I put "--slot-list" only once on the command
> line.
> 
> loki hello_2 116 mpiexec -np 1 --host loki hello_2_mpi : -np 3 --host loki 
> --slot-list 0:0-5,1:0-5 hello_2_slave_mpi
> Process 1 of 4 running on loki
> Process 2 of 4 running on loki
> Process 0 of 4 running on loki
> Process 3 of 4 running on loki
> ...
> 
> 
> Hopefully you know what happens and why it happens so that
> you can fix the problem for openmpi-1.10.x and openmpi-2.x.
> My three spawn programs work with openmpi-master as well
> while "spawn_master" breaks on both openmpi-1.10.x and
> openmpi-2.x with the same failure as my hello master/slave
> program.
> 
> Do you know when the Java problem will be solved?
> 
> 
> Kind regards
> 
> Siegmar
> 
> 
> 
> Am 15.05.2016 um 01:27 schrieb Ralph Castain:
>> 
>>> On May 7, 2016, at 1:13 AM, Siegmar Gross 
>>> <siegmar.gr...@informatik.hs-fulda.de> wrote:
>>> 
>>> Hi,
>>> 
>>> yesterday I installed openmpi-v1.10.2-176-g9d45e07 on my "SUSE Linux
>>> Enterprise Server 12 (x86_64)" with Sun C 5.13  and gcc-5.3.0. The
>>> following programs don't run anymore.
>>> 
>>> 
>>> loki hello_2 112 ompi_info | grep -e "OPAL repo revision" -e "C compiler 
>>> absolute"
>>>     OPAL repo revision: v1.10.2-176-g9d45e07
>>>    C compiler absolute: /opt/solstudio12.4/bin/cc
>>> loki hello_2 113 mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host 
>>> loki,loki hello_2_slave_mpi
>>> --------------------------------------------------------------------------
>>> There are not enough slots available in the system to satisfy the 2 slots
>>> that were requested by the application:
>>> hello_2_slave_mpi
>>> 
>>> Either request fewer slots for your application, or make more slots 
>>> available
>>> for use.
>>> --------------------------------------------------------------------------
>>> loki hello_2 114
>>> 
>> 
>> The above worked fine for me with:
>> 
>> OPAL repo revision: v1.10.2-182-g52c7573
>> 
>> You might try updating.
>> 
>>> 
>>> 
>>> Everything worked as expected with openmpi-v1.10.0-178-gb80f802.
>>> 
>>> loki hello_2 114 ompi_info | grep -e "OPAL repo revision" -e "C compiler 
>>> absolute"
>>>     OPAL repo revision: v1.10.0-178-gb80f802
>>>    C compiler absolute: /opt/solstudio12.4/bin/cc
>>> loki hello_2 115 mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host 
>>> loki,loki hello_2_slave_mpi
>>> Process 0 of 3 running on loki
>>> Process 1 of 3 running on loki
>>> Process 2 of 3 running on loki
>>> 
>>> Now 2 slave tasks are sending greetings.
>>> 
>>> Greetings from task 2:
>>> message type:        3
>>> ...
>>> 
>>> 
>>> I have the same problem with openmpi-v2.x-dev-1404-g74d8ea0, if I use
>>> the following commands.
>>> 
>>> mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,loki 
>>> hello_2_slave_mpi
>>> mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,nfs1 
>>> hello_2_slave_mpi
>>> mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki --slot-list 
>>> 0:0-5,1:0-5 hello_2_slave_mpi
>>> 
>>> 
>>> I have also the same problem with openmpi-dev-4010-g6c9d65c, if I use
>>> the following command.
>>> 
>>> mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,loki 
>>> hello_2_slave_mpi
>>> 
>>> 
>>> openmpi-dev-4010-g6c9d65c works as expected with the following commands.
>>> 
>>> mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,nfs1 
>>> hello_2_slave_mpi
>>> mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki --slot-list 
>>> 0:0-5,1:0-5 hello_2_slave_mpi
>>> 
>>> 
>>> Has the interface changed so that I'm not allowed to use some of my
>>> commands any longer? I would be grateful, if somebody can fix the
>>> problem if it is a problem. Thank you very much for any help in
>>> advance.
>>> 
>>> 
>>> 
>>> Kind regards
>>> 
>>> Siegmar
>>> <hello_2_mpi.c><hello_2_slave_mpi.c>_______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2016/05/29126.php
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2016/05/29205.php
>> 
> <hello_2_mpi.c><hello_2_slave_mpi.c>_______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29206.php

Reply via email to