Re: [OMPI users] OPENSHMEM ERROR with 2+ Distributed Machines

2016-08-14 Thread Debendra Das
I have installed OpenMPI-2.0.0 in 5 systems with IP addresses 172.16.5.29,
172.16.5.30, 172.16.5.31, 172.16.5.32, 172.16.5.33.While executing the
hello_oshmem_c.c program (under the examples directory) , correct output is
coming only when execution is done using 2 distributed machines.But error
is coming when 3 or more distributed machines are used.The outputs and the
host file  are attached.Can anybody please help me to sort out this error?

Thanking You.
Debendranath Das

On Fri, Aug 12, 2016 at 7:06 PM, r...@open-mpi.org  wrote:

> Just as a suggestion: most of us are leery of opening Word attachments on
> mailing lists. I’d suggest sending this to us as plain text if you want us
> to read it.
>
>
> > On Aug 12, 2016, at 4:03 AM, Debendra Das 
> wrote:
> >
> > I have installed OpenMPI-2.0.0 in 5 systems with IP addresses
> 172.16.5.29, 172.16.5.30, 172.16.5.31, 172.16.5.32, 172.16.5.33.While
> executing the hello_oshmem_c.c program (under the examples directory) ,
> correct output is coming only when executing is done using 2 distributed
> machines.But error is coming when 3 or more distributed machines are
> used.The outputs and the host file  are attached.Can anybody please help me
> to sort out this error?
> >
> > Thanking You.
> > Debendranath Das
> > ___
> > users mailing list
> > users@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

//hostfile.. 
cat my_host
172.16.5.33
172.16.5.32
172.16.5.31
172.16.5.30
172.16.5.29



//Correct Output with 2 Distributed machines..
oshrun -pernode  -np 2 --hostfile my_host hello_oshmem_c
Hello, world, I am 0 of 2: http://www.open-mpi.org/ (version: 1.2)
Hello, world, I am 1 of 2: http://www.open-mpi.org/ (version: 1.2)


//Error with more than 2 Distributed machines..
oshrun -pernode  -np 3 --hostfile my_host hello_oshmem_c
[localhost:04534] *** Process received signal ***
[localhost:04534] Signal: Segmentation fault (11)
[localhost:04534] Signal code: Address not mapped (1)
[localhost:04534] Failing at address: 0xb8
[localhost:04534] [ 0] /lib64/libpthread.so.0(+0x10c10)[0x7f34f9332c10]
[localhost:04534] [ 1] 
/home/rayan_ray/openmpi-2.0.0/lib/openmpi/mca_spml_yoda.so(mca_spml_yoda_add_procs+0x37e)[0x7f34eeb5d5de]
[localhost:04534] [ 2] 
/home/rayan_ray/openmpi-2.0.0/lib/liboshmem.so.20(oshmem_shmem_init+0x24d)[0x7f34f9846cad]
[localhost:04534] [ 3] 
/home/rayan_ray/openmpi-2.0.0/lib/liboshmem.so.20(pshmem_init+0x24)[0x7f34f98495b4]
[localhost:04534] [ 4] hello_oshmem_c[0x4008c3]
[localhost:04534] [ 5] /lib64/libc.so.6(__libc_start_main+0xf1)[0x7f34f8f7f731]
[localhost:04534] [ 6] hello_oshmem_c[0x4007d9]
[localhost:04534] *** End of error message ***
[localhost][[57029,1],0][btl_tcp_endpoint.c:599:mca_btl_tcp_endpoint_recv_blocking]
 recv(20, 0/8) failed: Connection reset by peer (104)
[localhost.localdomain:03671] pml_ob1_sendreq.c:189 FATAL




___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OPENSHMEM ERROR with 2+ Distributed Machines

2016-08-14 Thread Gilles Gouaillardet

Thanks for both the report and posting the logs in a plain text file.


i opened https://github.com/open-mpi/ompi/issues/1966 to track this issue,

it contains a patch that fixes/works around this issue.


Cheers,


Gilles


On 8/14/2016 7:39 PM, Debendra Das wrote:
I have installed OpenMPI-2.0.0 in 5 systems with IP addresses 
172.16.5.29, 172.16.5.30, 172.16.5.31, 172.16.5.32, 172.16.5.33.While 
executing the hello_oshmem_c.c program (under the examples directory) 
, correct output is coming only when execution is done using 2 
distributed machines.But error is coming when 3 or more distributed 
machines are used.The outputs and the host file  are attached.Can 
anybody please help me to sort out this error?


Thanking You.
Debendranath Das

On Fri, Aug 12, 2016 at 7:06 PM, r...@open-mpi.org 
 mailto:r...@open-mpi.org>> 
wrote:


Just as a suggestion: most of us are leery of opening Word
attachments on mailing lists. I’d suggest sending this to us as
plain text if you want us to read it.


> On Aug 12, 2016, at 4:03 AM, Debendra Das
mailto:debendra.swa...@gmail.com>> wrote:
>
> I have installed OpenMPI-2.0.0 in 5 systems with IP addresses
172.16.5.29, 172.16.5.30, 172.16.5.31, 172.16.5.32,
172.16.5.33.While executing the hello_oshmem_c.c program (under
the examples directory) , correct output is coming only when
executing is done using 2 distributed machines.But error is coming
when 3 or more distributed machines are used.The outputs and the
host file  are attached.Can anybody please help me to sort out
this error?
>
> Thanking You.
> Debendranath Das
> ___
> users mailing list
> users@lists.open-mpi.org 
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org 
https://rfd.newmexicoconsortium.org/mailman/listinfo/users





___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users