Re: [OMPI users] Number of processes and spawn

2011-03-04 Thread Federico Golfrè Andreasi
Hi Ralph,

I'm getting stuck with spawning stuff,

I've downloaded the snapshot from the trunk of 1st of March (
openmpi-1.7a1r24472.tar.bz2),
I'm testing using a small program that does the following:
 - master program starts and each rank prints his hostsname
 - master program spawn a slave program with the same size
 - each rank of the slave (spawned) program prints his hostname
 - end
Not always he is able to complete the progam run, two different behaviour:
 1. not all the slave print their hostname and the program ends suddenly
 2. both program ends correctly but orted demon is still alive and I need to
press crtl-c to exit


I've tryed to recompile my test program with a previous snapshot
(openmpi-1.7a1r22794.tar.bz2)
where I have only the compiled version of OpenMPI (in another machine).
It gives me an error before starting (I've attacehd)
Surfing on the FAQ I found some tip and I verified to compile the program
with the correct OpenMPI version,
that the LD_LIBRARY_PATH is consistent.
So I would like to re-compile the openmpi-1.7a1r22794.tar.bz2 but where can
I found it ?


Thank you,
Federico










Il giorno 23 febbraio 2011 03:43, Ralph Castain  ha
scritto:

> Apparently not. I will investigate when I return from vacation next week.
>
>
> Sent from my iPad
>
> On Feb 22, 2011, at 12:42 AM, Federico Golfrè Andreasi <
> federico.gol...@gmail.com> wrote:
>
> Hi Ralf,
>
> I've tested spawning with the OpenMPI 1.5 release but that fix is not
> there.
> Are you sure you've added it ?
>
> Thank you,
> Federico
>
>
>
> 2010/10/19 Ralph Castain < r...@open-mpi.org>
>
>> The fix should be there - just didn't get mentioned.
>>
>> Let me know if it isn't and I'll ensure it is in the next one...but I'd be
>> very surprised if it isn't already in there.
>>
>>
>> On Oct 19, 2010, at 3:03 AM, Federico Golfrè Andreasi wrote:
>>
>> Hi Ralf !
>>
>> I saw that the new realease 1.5 is out.
>> I didn't found this fix in the "list of changes", is it present but not
>> mentioned since is a minor fix ?
>>
>> Thank you,
>> Federico
>>
>>
>>
>> 2010/4/1 Ralph Castain < r...@open-mpi.org>
>>
>>> Hi there!
>>>
>>> It will be in the 1.5.0 release, but not 1.4.2 (couldn't backport the
>>> fix). I understand that will come out sometime soon, but no firm date has
>>> been set.
>>>
>>>
>>> On Apr 1, 2010, at 4:05 AM, Federico Golfrè Andreasi wrote:
>>>
>>> Hi Ralph,
>>>
>>>
>>>  I've downloaded and tested the openmpi-1.7a1r22817 snapshot,
>>> and it works fine for (multiple) spawning more than 128 processes.
>>>
>>> That fix will be included in the next release of OpenMPI, right ?
>>> Do you when it will be released ? Or where I can find that info ?
>>>
>>> Thank you,
>>>  Federico
>>>
>>>
>>>
>>> 2010/3/1 Ralph Castain < r...@open-mpi.org>
>>>
 
 http://www.open-mpi.org/nightly/trunk/

 I'm not sure this patch will solve your problem, but it is worth a try.




>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> 
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>> ___
>>> users mailing list
>>>  us...@open-mpi.org
>>>  
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> 
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> ___
>> users mailing list
>>  us...@open-mpi.org
>>  
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>


OpenMPI.error
Description: Binary data


[OMPI users] OpenMPI 1.2.x segfault as regular user

2011-03-04 Thread Youri LACAN-BARTLEY
Hi,

 

This is my first post to this mailing-list so I apologize for maybe
being a little rough on the edges.

I've been digging into OpenMPI for a little while now and have come
across one issue that I just can't explain and I'm sincerely hoping
someone can put me on the right track here.

 

I'm using a fresh install of openmpi-1.2.7 and I systematically get a
segmentation fault at the end of my mpirun calls if I'm logged in as a
regular user.

However, as soon as I switch to the root account, the segfault does not
appear.

The jobs actually run to their term but I just can't find a good reason
for this to be happening and I haven't been able to reproduce the
problem on another machine.

 

Any help or tips would be greatly appreciated.

 

Thanks,

 

Youri LACAN-BARTLEY

 

Here's an example running osu_latency locally (I've "blacklisted" openib
to make sure it's not to blame):

 

[user@server ~]$ mpirun --mca btl ^openib  -np 2
/opt/scripts/osu_latency-openmpi-1.2.7

# OSU MPI Latency Test v3.3

# SizeLatency (us)

0 0.76

1 0.89

2 0.89

4 0.89

8 0.89

160.91

320.91

640.92

128   0.96

256   1.13

512   1.31

1024  1.69

2048  2.51

4096  5.34

8192  9.16

1638417.47

3276831.79

6553651.10

131072   92.41

262144  181.74

524288  512.26

10485761238.21

20971522280.28

41943044616.67

[server:15586] *** Process received signal ***

[server:15586] Signal: Segmentation fault (11)

[server:15586] Signal code: Address not mapped (1)

[server:15586] Failing at address: (nil)

[server:15586] [ 0] /lib64/libpthread.so.0 [0x3cd1e0eb10]

[server:15586] [ 1] /lib64/libc.so.6 [0x3cd166fdc9]

[server:15586] [ 2] /lib64/libc.so.6(__libc_malloc+0x167) [0x3cd1674dd7]

[server:15586] [ 3] /lib64/ld-linux-x86-64.so.2(__tls_get_addr+0xb1)
[0x3cd120fe61]

[server:15586] [ 4] /lib64/libselinux.so.1 [0x3cd320f5cc]

[server:15586] [ 5] /lib64/libselinux.so.1 [0x3cd32045df]

[server:15586] *** End of error message ***

[server:15587] *** Process received signal ***

[server:15587] Signal: Segmentation fault (11)

[server:15587] Signal code: Address not mapped (1)

[server:15587] Failing at address: (nil)

[server:15587] [ 0] /lib64/libpthread.so.0 [0x3cd1e0eb10]

[server:15587] [ 1] /lib64/libc.so.6 [0x3cd166fdc9]

[server:15587] [ 2] /lib64/libc.so.6(__libc_malloc+0x167) [0x3cd1674dd7]

[server:15587] [ 3] /lib64/ld-linux-x86-64.so.2(__tls_get_addr+0xb1)
[0x3cd120fe61]

[server:15587] [ 4] /lib64/libselinux.so.1 [0x3cd320f5cc]

[server:15587] [ 5] /lib64/libselinux.so.1 [0x3cd32045df]

[server:15587] *** End of error message ***

mpirun noticed that job rank 0 with PID 15586 on node server exited on
signal 11 (Segmentation fault).

1 additional process aborted (not shown)

[server:15583] *** Process received signal ***

[server:15583] Signal: Segmentation fault (11)

[server:15583] Signal code: Address not mapped (1)

[server:15583] Failing at address: (nil)

[server:15583] [ 0] /lib64/libpthread.so.0 [0x3cd1e0eb10]

[server:15583] [ 1] /lib64/libc.so.6 [0x3cd166fdc9]

[server:15583] [ 2] /lib64/libc.so.6(__libc_malloc+0x167) [0x3cd1674dd7]

[server:15583] [ 3] /lib64/ld-linux-x86-64.so.2(__tls_get_addr+0xb1)
[0x3cd120fe61]

[server:15583] [ 4] /lib64/libselinux.so.1 [0x3cd320f5cc]

[server:15583] [ 5] /lib64/libselinux.so.1 [0x3cd32045df]

[server:15583] *** End of error message ***

Segmentation fault



[OMPI users] Error in executing NAS Benchmarks

2011-03-04 Thread vaibhav dutt
Hi,


I am trying to execute NAS benchmark across 2 Nodes, each having 4 cores.
The execution works fine on a single node, but when I try to execute the
benchmark across 2 Nodes, I
get an error like:

mpiexec -machinefile hostfile.txt -n 8 ./ep.A.8
bash: orted: command not found
--

A daemon (pid 22973) died unexpectedly with status 127 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--
--
mpiexec noticed that the job aborted, but has no info as to the process
that caused that situation.
--
mpiexec: clean termination accomplished


Can anybody please suggest the reason for this.

Thanks,
Vaibhav


Re: [OMPI users] Error in executing NAS Benchmarks

2011-03-04 Thread Jeff Squyres
It looks like your PATH is not set properly on the remote nodes to find the 
Open MPI installation.  Check the FAQ for some details on this.


On Mar 4, 2011, at 2:26 PM, vaibhav dutt wrote:

> Hi,
> 
> 
> I am trying to execute NAS benchmark across 2 Nodes, each having 4 cores.
> The execution works fine on a single node, but when I try to execute the 
> benchmark across 2 Nodes, I
> get an error like:
> 
> mpiexec -machinefile hostfile.txt -n 8 ./ep.A.8 
> bash: orted: command not found
> --
> 
> A daemon (pid 22973) died unexpectedly with status 127 while attempting
> to launch so we are aborting.
> 
> There may be more information reported by the environment (see above).
> 
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --
> --
> mpiexec noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --
> mpiexec: clean termination accomplished
> 
> 
> Can anybody please suggest the reason for this.
> 
> Thanks,
> Vaibhav
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/