Re: [OMPI users] mpi functions are slow when first called and become normal afterwards

2009-11-12 Thread RightCFD
>
> Date: Thu, 29 Oct 2009 15:45:06 -0400
> From: Brock Palen 
> Subject: Re: [OMPI users] mpi functions are slow when first called and
>become normal afterwards
> To: Open MPI Users 
> Message-ID: <890cc430-68b0-4307-8260-24a6fadae...@umich.edu>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
>
> > When MPI_Bcast and MPI_Reduce are called for the first time, they
> > are very slow. But after that, they run at normal and stable speed.
> > Is there anybody out there who have encountered such problem? If you
> > need any other information, please let me know and I'll provide.
> > Thanks in advance.
>
> This is expected, and I think you can dig though the message archive
> to find the answer.  OMPI does not wireup all the communication at
> startup, thus the first time you communicate with a host the
> connection is made, but after that it is fast because it is already
> open.  This behavior is expected, and is needed for very large systems
> where you could run out of sockets for some types of communication
> with so many hosts.
>
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> bro...@umich.edu
> (734)936-1985
>
> Thanks for your reply. I am surprised to know this is an expected behavior
> of OMPI. I searched the archival but did not find many relevant messages. I
> am wondering why other users of OMPI do not complain this. Is there a way to
> avoid this when timing an MPI program?


[OMPI users] OFED-1.5rc1 with OpenMPI and IB

2009-11-12 Thread Stefan Kuhne
Hello,

i try to install a small HPC-cluster for education usage.
Infiniband is working as well i can ping over IB.
When i try to run an MPI program i get:

user@head:~/Cluster/hello$ mpirun --hostfile ../Cluster.hosts hello
--
WARNING: There was an error initializing an OpenFabrics device.

   Local host:   head
   Local device: mthca0
--
Hier ist Job  0 von  1 auf head
user@head:~/Cluster/hello$

How can i get more information about this error?

Regards,
Stefan Kuhne




signature.asc
Description: OpenPGP digital signature


Re: [OMPI users] mpi functions are slow when first called and become normal afterwards

2009-11-12 Thread Ralph Castain
You can have OMPI wireup -all- available connections at startup of he  
processes with


-mca mpi_preconnect_all 1

Be aware of Brock's caution. Also, note that this occurs at MPI_Init  
so you can adjust your timing marks accordingly.



On Nov 11, 2009, at 10:04 PM, RightCFD wrote:


Date: Thu, 29 Oct 2009 15:45:06 -0400
From: Brock Palen 
Subject: Re: [OMPI users] mpi functions are slow when first called and
   become normal afterwards
To: Open MPI Users 
Message-ID: <890cc430-68b0-4307-8260-24a6fadae...@umich.edu>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

> When MPI_Bcast and MPI_Reduce are called for the first time, they
> are very slow. But after that, they run at normal and stable speed.
> Is there anybody out there who have encountered such problem? If you
> need any other information, please let me know and I'll provide.
> Thanks in advance.

This is expected, and I think you can dig though the message archive
to find the answer.  OMPI does not wireup all the communication at
startup, thus the first time you communicate with a host the
connection is made, but after that it is fast because it is already
open.  This behavior is expected, and is needed for very large systems
where you could run out of sockets for some types of communication
with so many hosts.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985

Thanks for your reply. I am surprised to know this is an expected  
behavior of OMPI. I searched the archival but did not find many  
relevant messages. I am wondering why other users of OMPI do not  
complain this. Is there a way to avoid this when timing an MPI  
program?

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] OFED-1.5rc1 with OpenMPI and IB

2009-11-12 Thread Jeff Squyres

Can you submit all the information requested here:

http://www.open-mpi.org/community/help/


On Nov 12, 2009, at 1:28 AM, Stefan Kuhne wrote:


Hello,

i try to install a small HPC-cluster for education usage.
Infiniband is working as well i can ping over IB.
When i try to run an MPI program i get:

user@head:~/Cluster/hello$ mpirun --hostfile ../Cluster.hosts hello
--
WARNING: There was an error initializing an OpenFabrics device.

   Local host:   head
   Local device: mthca0
--
Hier ist Job  0 von  1 auf head
user@head:~/Cluster/hello$

How can i get more information about this error?

Regards,
Stefan Kuhne



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
jsquy...@cisco.com



[OMPI users] Release date for 1.3.4?

2009-11-12 Thread John R. Cary

From http://svn.open-mpi.org/svn/ompi/branches/v1.3/NEWS I see:

- Many updates and fixes to the (non-default) "sm" collective
 component (i.e., native shared memory MPI collective operations).

Will this fix the problem noted at

https://svn.open-mpi.org/trac/ompi/ticket/2043

??

Thanks..John Cary


Re: [OMPI users] Release date for 1.3.4?

2009-11-12 Thread Ralph Castain

Release should be soon after SC09 is over, I suspect.

On Nov 12, 2009, at 7:35 AM, John R. Cary wrote:


From http://svn.open-mpi.org/svn/ompi/branches/v1.3/NEWS I see:

- Many updates and fixes to the (non-default) "sm" collective
component (i.e., native shared memory MPI collective operations).

Will this fix the problem noted at

https://svn.open-mpi.org/trac/ompi/ticket/2043


I don't think so as that ticket doesn't indicate a fix has been  
developed - otherwise, the ticket would be closed :-)




??

Thanks..John Cary
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] mpi functions are slow when first called and become normal afterwards

2009-11-12 Thread Eugene Loh




RightCFD wrote:

  
  Date:
Thu, 29 Oct 2009 15:45:06 -0400
From: Brock Palen 
Subject: Re: [OMPI users] mpi functions are slow when first called and
       become normal afterwards
To: Open MPI Users 
Message-ID: <890cc430-68b0-4307-8260-24a6fadae...@umich.edu>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

> When MPI_Bcast and MPI_Reduce are called for the first time, they
> are very slow. But after that, they run at normal and stable speed.
> Is there anybody out there who have encountered such problem? If
you
> need any other information, please let me know and I'll provide.
> Thanks in advance.

This is expected, and I think you can dig though the message archive
to find the answer.  OMPI does not wireup all the communication at
startup, thus the first time you communicate with a host the
connection is made, but after that it is fast because it is already
open.  This behavior is expected, and is needed for very large systems
where you could run out of sockets for some types of communication
with so many hosts.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985

Thanks for your reply. I am surprised to know this is an expected
behavior of OMPI. I searched the archival but did not find many
relevant messages. I am wondering why other users of OMPI do not
complain this. Is there a way to avoid this when timing an MPI program?
  

An example of this is the NAS Parallel Benchmarks, which have been
around nearly 20 years.  They:

*) turn timers on after MPI_Init and off before MPI_Finalize
*) execute at least one iteration before starting timers

Even so, with at least one of the NPB tests and with at least one MPI
implementation, I've seen more than one iteration needed to warm things
up.  That is, if you timed each iteration, you could see that multiple
iterations were needed to warm everything up.  In performance analysis,
it is reasonably common to expect to have to run multiple iterations
and correct data set size to get representative behavior.




Re: [OMPI users] Release date for 1.3.4?

2009-11-12 Thread Jeff Squyres

I think Eugene will have to answer this one -- Eugeue?

On Nov 12, 2009, at 6:35 AM, John R. Cary wrote:



 From http://svn.open-mpi.org/svn/ompi/branches/v1.3/NEWS I see:

- Many updates and fixes to the (non-default) "sm" collective
  component (i.e., native shared memory MPI collective operations).

Will this fix the problem noted at

https://svn.open-mpi.org/trac/ompi/ticket/2043

??

Thanks..John Cary
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI users] Release date for 1.3.4?

2009-11-12 Thread Eugene Loh

Jeff Squyres wrote:


I think Eugene will have to answer this one -- Eugeue?

On Nov 12, 2009, at 6:35 AM, John R. Cary wrote:


From http://svn.open-mpi.org/svn/ompi/branches/v1.3/NEWS I see:

- Many updates and fixes to the (non-default) "sm" collective
  component (i.e., native shared memory MPI collective operations).

Will this fix the problem noted at

https://svn.open-mpi.org/trac/ompi/ticket/2043


I've been devoting a lot of time to this one.  There seems (to me) to be 
something very goofy going on here, but I've whittled it down a lot.  I 
hope soon to have a simple demonstration of the problem so that it'll be 
possible to decide if there is a problem with OMPI, GCC 4.4.0, both, or 
something else.  So, I've made a lot of progress, but you're asking how 
much longer until it's solved.  Different question.  I don't know.  This 
is not a straightforward problem.


Re: [OMPI users] checkpoint opempi-1.3.3+sge62

2009-11-12 Thread Sergio Díaz

Hi Josh,

You were right. The main problem was the /tmp. SGE uses a scratch 
directory in which the jobs have temporary files. Setting TMPDIR to 
/tmp, checkpoint works!
However, when I try to restart it... I got the following error (see 
ERROR1). Option -v agrees these lines (see ERRO2).


I was trying to use ssh instead of rsh but I was impossible. By default 
it should use ssh and if it finds a problem, it will use rsh. It seems 
that ssh doesn't work because always use rsh.

If I change this MCA parameter, It still uses rsh.
If I set OMPI_MCA_plm_rsh_disable_qrsh variable to 1, It try to use ssh 
and doesn't works. I got --> "bash: orted: command not found" and the 
mpi process dies.
The command which try to execute is the following and I haven't found 
yet the reason why this command doesn't found orted because I set the 
/etc/bashrc in order to get always the right path and I have the right 
path into my application. (see ERROR4).


Many thanks!,
Sergio

P.S. Sorry about these long emails. I just try to show you useful 
information to identify my problems.



ERROR 1
>
> [sdiaz@compute-3-18 ~]$ ompi-restart ompi_global_snapshot_28454.ckpt
> 
--

> Error: Unable to obtain the proper restart command to restart from the
>checkpoint file (opal_snapshot_0.ckpt). Returned -1.
>
> 
--
> 
--

> Error: Unable to obtain the proper restart command to restart from the
>checkpoint file (opal_snapshot_1.ckpt). Returned -1.
>
> 
--

> [compute-3-18:28792] *** Process received signal ***
> [compute-3-18:28792] Signal: Segmentation fault (11)
> [compute-3-18:28792] Signal code:  (128)
> [compute-3-18:28792] Failing at address: (nil)
> [compute-3-18:28792] [ 0] /lib64/tls/libpthread.so.0 [0x33bbf0c430]
> [compute-3-18:28792] [ 1] /lib64/tls/libc.so.6(__libc_free+0x25) 
[0x33bb669135]
> [compute-3-18:28792] [ 2] 
/opt/cesga/openmpi-1.3.3/lib/libopen-pal.so.0(opal_argv_free+0x2e) 
[0x2a95586658]
> [compute-3-18:28792] [ 3] 
/opt/cesga/openmpi-1.3.3/lib/libopen-pal.so.0(opal_event_fini+0x1e) 
[0x2a9557906e]
> [compute-3-18:28792] [ 4] 
/opt/cesga/openmpi-1.3.3/lib/libopen-pal.so.0(opal_finalize+0x36) 
[0x2a9556bcfa]

> [compute-3-18:28792] [ 5] opal-restart [0x40312a]
> [compute-3-18:28792] [ 6] 
/lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x33bb61c3fb]

> [compute-3-18:28792] [ 7] opal-restart [0x40272a]
> [compute-3-18:28792] *** End of error message ***
> [compute-3-18:28793] *** Process received signal ***
> [compute-3-18:28793] Signal: Segmentation fault (11)
> [compute-3-18:28793] Signal code:  (128)
> [compute-3-18:28793] Failing at address: (nil)
> [compute-3-18:28793] [ 0] /lib64/tls/libpthread.so.0 [0x33bbf0c430]
> [compute-3-18:28793] [ 1] /lib64/tls/libc.so.6(__libc_free+0x25) 
[0x33bb669135]
> [compute-3-18:28793] [ 2] 
/opt/cesga/openmpi-1.3.3/lib/libopen-pal.so.0(opal_argv_free+0x2e) 
[0x2a95586658]
> [compute-3-18:28793] [ 3] 
/opt/cesga/openmpi-1.3.3/lib/libopen-pal.so.0(opal_event_fini+0x1e) 
[0x2a9557906e]
> [compute-3-18:28793] [ 4] 
/opt/cesga/openmpi-1.3.3/lib/libopen-pal.so.0(opal_finalize+0x36) 
[0x2a9556bcfa]

> [compute-3-18:28793] [ 5] opal-restart [0x40312a]
> [compute-3-18:28793] [ 6] 
/lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x33bb61c3fb]

> [compute-3-18:28793] [ 7] opal-restart [0x40272a]
> [compute-3-18:28793] *** End of error message ***
> 
--
> mpirun noticed that process rank 0 with PID 28792 on node 
compute-3-18.local exited on signal 11 (Segmentation fault).
> 
--

>


ERROR 2

> [sdiaz@compute-3-18 ~]$ ompi-restart -v ompi_global_snapshot_28454.ckpt
>[compute-3-18.local:28941] Checking for the existence of 
(/home/cesga/sdiaz/ompi_global_snapshot_28454.ckpt)  
> [compute-3-18.local:28941] Restarting from file 
(ompi_global_snapshot_28454.ckpt)   

> [compute-3-18.local:28941]   Exec in self 
> ...   





ERROR3

>[sdiaz@compute-3-18 ~]$ ompi_info  --all|grep "plm_rsh_agent"
> How many plm_rsh_agent instances to invoke concurrently (must 
be > 0)
> MCA plm: parameter "plm_rsh_agent" (current value: "s

Re: [OMPI users] users Digest, Vol 1401, Issue 2

2009-11-12 Thread Jeff Squyres
It looks like your executable is explicitly calling MPI_ABORT in the  
CmiAbort function -- perhaps in response to something happening in the  
namd or CmiHandleMessage functions.  The next logical step would  
likely be to look in those routines and see why MPI_ABORT/CmiAbort  
would be invoked.



On Nov 11, 2009, at 4:49 AM, Yogesh Aher wrote:

Yes.. The executables run initially and then gives the mentioned  
error in the first message!

i.e.

./mpirun -hostfile machines executable
--
MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--
--
mpirun has exited due to process rank 2 with PID 15617 on
node sibar.pch.univie.ac.at exiting without calling "finalize". This  
may

have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--
[2] Stack Traceback:
  [0] CmiAbort+0x25  [0x8366f3e]
  [1] namd [0x830d4cd]
  [2] CmiHandleMessage+0x22  [0x8367c20]
  [3] CsdScheduleForever+0x67  [0x8367dd2]
  [4] CsdScheduler+0x12  [0x8367d4c]
  [5] _Z10slave_initiPPc+0x21  [0x80fa09d]
  [6] _ZN7BackEnd4initEiPPc+0x53  [0x80fa0f5]
  [7] main+0x2e  [0x80f65b6]
  [8] __libc_start_main+0xd3  [0x31cde3]
  [9] __gxx_personality_v0+0x101  [0x80f3405]
[3] Stack Traceback:
  [0] CmiAbort+0x25  [0x8366f3e]
  [1] namd [0x830d4cd]
  [2] CmiHandleMessage+0x22  [0x8367c20]
  [3] CsdScheduleForever+0x67  [0x8367dd2]
  [4] CsdScheduler+0x12  [0x8367d4c]
  [5] _Z10slave_initiPPc+0x21  [0x80fa09d]
  [6] _ZN7BackEnd4initEiPPc+0x53  [0x80fa0f5]
  [7] main+0x2e  [0x80f65b6]
  [8] __libc_start_main+0xd3  [0x137de3]
  [9] __gxx_personality_v0+0x101  [0x80f3405]
Running on MPI version: 2.1 multi-thread support: MPI_THREAD_SINGLE  
(max supported: MPI_THREAD_SINGLE)

cpu topology info is being gathered.
2 unique compute nodes detected.

- Processor 2 Exiting: Called CmiAbort 
Reason: Internal Error: Unknown-msg-type. Contact Developers.

- Processor 3 Exiting: Called CmiAbort 
Reason: Internal Error: Unknown-msg-type. Contact Developers.

[studpc01.xxx.xxx.xx:15615] 1 more process has sent help message  
help-mpi-api.txt / mpi-abort
[studpc01.xxx.xxx.xx:15615] Set MCA parameter  
"orte_base_help_aggregate" to 0 to see all help / error messages
[studpc21.xx.xx.xx][[6986,1],0][btl_tcp_frag.c: 
124:mca_btl_tcp_frag_send] mca_btl_tcp_frag_send: writev failed:  
Connection reset by peer (104)
[studpc21.xx.xx.xx][[6986,1],0][btl_tcp_frag.c: 
124:mca_btl_tcp_frag_send] mca_btl_tcp_frag_send: writev failed:  
Connection reset by peer (104)


Yes, I put 64-bit executable on 1 machine (studpc21) & 32-bit  
executable on another machine (studpc01) with same name! But, I  
don't know whether they are being used separately or not. How can I  
check it?
Can we use this option " ./mpirun -hetero" for specifying the  
machines? The jobs run individually on each machine, but if used  
together, it doesn't!


Hope it will give some hint coming at the solution..


Message: 2
Date: Tue, 10 Nov 2009 07:56:47 -0500
From: Jeff Squyres 
Subject: Re: [OMPI users] Openmpi on Heterogeneous environment
To: "Open MPI Users" 
Message-ID: <8f008aab-358b-4e6a-83a0-9ece60fd5...@cisco.com>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

Do you see any output from your executables?  I.e., are you sure that
it's running the "correct" executables?  If so, do you know how far
it's getting in its run before aborting?


On Nov 10, 2009, at 7:36 AM, Yogesh Aher wrote:

> Thanks for the reply Pallab. Firewall is not an issue as I can
> passwordless-SSH to/from both machines.
> My problem is to deal with 32bit & 64bit architectures
> simultaneously (and not with different operating systems). Can it be
> possible through open-MPI???
>
> Look forward to the solution!
>
> Thanks,
> Yogesh
>
>
> From: Pallab Datta (datta_at_[hidden])
>
> I have had issues for running in cross platforms..ie. Mac OSX and
> Linux
> (Ubuntu)..haven't got it resolved..check firewalls if thats blocking
> any
> communication..
>
> On Thu, Nov 5, 2009 at 7:47 PM, Yogesh Aher 
> wrote:
> Dear Open-mpi users,
>
> I have installed openmpi on 2 different machines with different
> architectures (INTEL and x86_64) separately (command: ./configure --
> enable-heterogeneous). Compiled executables of the same code for
> these 2 arch. Kept these executables on individual machines.
> Prepared a hostfile containing the names of those 2 machines.
> Now, when I want to execute the code (giving command - ./mpirun -
> hostfile machin

Re: [OMPI users] mpi functions are slow when first called and become normal afterwards

2009-11-12 Thread Gus Correa

Eugene Loh wrote:

RightCFD wrote:


Date: Thu, 29 Oct 2009 15:45:06 -0400
From: Brock Palen mailto:bro...@umich.edu>>
Subject: Re: [OMPI users] mpi functions are slow when first called and
   become normal afterwards
To: Open MPI Users mailto:us...@open-mpi.org>>
Message-ID: <890cc430-68b0-4307-8260-24a6fadae...@umich.edu
>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

> When MPI_Bcast and MPI_Reduce are called for the first time, they
> are very slow. But after that, they run at normal and stable speed.
> Is there anybody out there who have encountered such problem? If you
> need any other information, please let me know and I'll provide.
> Thanks in advance.

This is expected, and I think you can dig though the message archive
to find the answer.  OMPI does not wireup all the communication at
startup, thus the first time you communicate with a host the
connection is made, but after that it is fast because it is already
open.  This behavior is expected, and is needed for very large systems
where you could run out of sockets for some types of communication
with so many hosts.

Brock Palen
www.umich.edu/~brockp 
Center for Advanced Computing
bro...@umich.edu 
(734)936-1985

Thanks for your reply. I am surprised to know this is an expected
behavior of OMPI. I searched the archival but did not find many
relevant messages. I am wondering why other users of OMPI do not
complain this. Is there a way to avoid this when timing an MPI
program?

An example of this is the NAS Parallel Benchmarks, which have been 
around nearly 20 years.  They:


*) turn timers on after MPI_Init and off before MPI_Finalize
*) execute at least one iteration before starting timers

Even so, with at least one of the NPB tests and with at least one MPI 
implementation, I've seen more than one iteration needed to warm things 
up.  That is, if you timed each iteration, you could see that multiple 
iterations were needed to warm everything up.  In performance analysis, 
it is reasonably common to expect to have to run multiple iterations and 
correct data set size to get representative behavior.





And I would guess in OpenMPI, maybe in other implementations too,
the time you spend warming up, probing the best way to do things,
is widely compensated for during steady state execution,
if the number of iterations is not very small.
This seems to be required to accommodate for the large variety
of hardware and software platforms, and be efficient on all of them.
Right?

AFAIK, other high quality software (e.g. FFTW)
do follow a similar rationale.

Gus Correa




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] users Digest, Vol 1403, Issue 4

2009-11-12 Thread RightCFD
Thanks for all your inputs.

It is good to know this initial latency is an expected behavior and the
workaround by using one dummy iteration before timing is started.

I did not notice this because my older parallel CFD code runs a large number
 of time steps and the initial latency was compensated.

But recently I am teaching MPI stuff using small parallel codes and noticed
this behavior.

This relieves my concern about our system performance.

Thanks again.



> Date: Thu, 12 Nov 2009 11:18:24 -0500
> From: Gus Correa 
> Subject: Re: [OMPI users] mpi functions are slow when first called and
>become normal afterwards
> To: Open MPI Users 
> Message-ID: <4afc3550.10...@ldeo.columbia.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Eugene Loh wrote:
> > RightCFD wrote:
> >>
> >> Date: Thu, 29 Oct 2009 15:45:06 -0400
> >> From: Brock Palen mailto:bro...@umich.edu>>
> >> Subject: Re: [OMPI users] mpi functions are slow when first called
> and
> >>become normal afterwards
> >> To: Open MPI Users mailto:us...@open-mpi.org>>
> >> Message-ID: <890cc430-68b0-4307-8260-24a6fadae...@umich.edu
> >> >
> >> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
> >>
> >> > When MPI_Bcast and MPI_Reduce are called for the first time, they
> >> > are very slow. But after that, they run at normal and stable
> speed.
> >> > Is there anybody out there who have encountered such problem? If
> you
> >> > need any other information, please let me know and I'll provide.
> >> > Thanks in advance.
> >>
> >> This is expected, and I think you can dig though the message archive
> >> to find the answer.  OMPI does not wireup all the communication at
> >> startup, thus the first time you communicate with a host the
> >> connection is made, but after that it is fast because it is already
> >> open.  This behavior is expected, and is needed for very large
> systems
> >> where you could run out of sockets for some types of communication
> >> with so many hosts.
> >>
> >> Brock Palen
> >> www.umich.edu/~brockp 
> >> Center for Advanced Computing
> >> bro...@umich.edu 
> >> (734)936-1985
> >>
> >> Thanks for your reply. I am surprised to know this is an expected
> >> behavior of OMPI. I searched the archival but did not find many
> >> relevant messages. I am wondering why other users of OMPI do not
> >> complain this. Is there a way to avoid this when timing an MPI
> >> program?
> >>
> > An example of this is the NAS Parallel Benchmarks, which have been
> > around nearly 20 years.  They:
> >
> > *) turn timers on after MPI_Init and off before MPI_Finalize
> > *) execute at least one iteration before starting timers
> >
> > Even so, with at least one of the NPB tests and with at least one MPI
> > implementation, I've seen more than one iteration needed to warm things
> > up.  That is, if you timed each iteration, you could see that multiple
> > iterations were needed to warm everything up.  In performance analysis,
> > it is reasonably common to expect to have to run multiple iterations and
> > correct data set size to get representative behavior.
> >
> >
>
> And I would guess in OpenMPI, maybe in other implementations too,
> the time you spend warming up, probing the best way to do things,
> is widely compensated for during steady state execution,
> if the number of iterations is not very small.
> This seems to be required to accommodate for the large variety
> of hardware and software platforms, and be efficient on all of them.
> Right?
>
> AFAIK, other high quality software (e.g. FFTW)
> do follow a similar rationale.
>
> Gus Correa
>
> > 
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> --
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> End of users Digest, Vol 1403, Issue 4
> **
>


Re: [OMPI users] Problem with mpirun -preload-binary option

2009-11-12 Thread Qing Pang
Now that I have passwordless-ssh set up both directions, and verified 
working - I still have the same problem.
I'm able to run ssh/scp on both master and client nodes - (at this 
point, they are pretty much the same), without being asked for password. 
And mpirun works fine if I have the executable put in the same directory 
on both nodes.


But when I tried the preload-binary option, I still have the same 
problem - it asked me for the password of the node running mpirun, and 
then tells that scp failed.


---


Josh Wrote:

Though the --preload-binary option was created while building the 
checkpoint/restart functionality it does not depend on 
checkpoint/restart function in any way (just a side effect of the 
initial development).


The problem you are seeing is a result of the computing environment 
setup of password-less ssh. The --preload-binary command uses 'scp' (at 
the moment) to copy the files from the node running mpirun to the 
compute nodes. The compute nodes are the ones that call 'scp', so you 
will need to setup password-less ssh in both directions.


-- Josh

On Nov 11, 2009, at 8:38 AM, Ralph Castain wrote:

 I'm no expert on the preload-binary option - but I would suspect that 

is the case given your observations.


 That option was created to support checkpoint/restart, not for what 
you are attempting to do. Like I said, you -should- be able to use it 
for that purpose, but I expect you may hit a few quirks like this along 
the way.


 On Nov 11, 2009, at 9:16 AM, Qing Pang wrote:

> Thank you very much for your help! I believe I do have password-less 
ssh set up, at least from master node to client node (desktop -> laptop 
in my case). If I type >ssh node1 on my desktop terminal, I am able to 
get to the laptop node without being asked for password. And as I 
mentioned, if I copy the example executable from desktop to the laptop 
node using scp, then I am able to run it from desktop using both nodes.
> Back to the preload-binary problem - I am asked for the password of 
my master node - the node I am working on - not the remote client node. 
Do you mean that I should set up password-less ssh in both direction? 
Does the client node need to access master node through password-less 
ssh to make the preload-binary option work?

>
>
> Ralph Castain Wrote:
>
> It -should- work, but you need password-less ssh setup. See our FAQ
> for how to do that, if you are unfamiliar with it.
>
> On Nov 10, 2009, at 2:02 PM, Qing Pang wrote:
>
> I'm having problem getting the mpirun "preload-binary" option to work.
>>
>> I'm using ubutu8.10 with openmpi 1.3.3, nodes connected with 

Ethernet cable.
>> If I copy the executable to client nodes using scp, then do mpirun, 

everything works.

>>
>> But I really want to avoid the copying, so I tried the 

-preload-binary option.

>>
>> When I typed the command on my master node as below (gordon-desktop 

is my master node, and gordon-laptop is the client node):

>>
>> 

--

>> gordon_at_gordon-desktop:~/Desktop/openmpi-1.3.3/examples$ mpirun
>> -machinefile machine.linux -np 2 --preload-binary $(pwd)/hello_c.out
>> 

--

>>
>> I got the following:
>>
>> gordon_at_gordon-desktop's password: (I entered my password here, 

why am I asked for the password? I am working under this account anyway)

>>
>>
>> WARNING: Remote peer ([[18118,0],1]) failed to preload a file.
>>
>> Exit Status: 256
>> Local File: 

/tmp/openmpi-sessions-gordon_at_gordon-laptop_0/18118/0/hello_c.out

>> Remote File: /home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out
>> Command:
>> scp 

gordon-desktop:/home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out

>> /tmp/openmpi-sessions-gordon_at_gordon-laptop_0/18118/0/hello_c.out
>>
>> Will continue attempting to launch the process(es).
>> 

--
>> 

--
>> mpirun was unable to launch the specified application as it could 

not access

>> or execute an executable:
>>
>> Executable: /home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out
>> Node: node1
>>
>> while attempting to start process rank 1.
>> 

--

>>
>> Had anyone succeeded with the 'preload-binary' option with the 
similar settings? I assume this mpirun option should work when compiling 
openmpi with default options? Anything I need to set?

>>
>> --qing
>>
>>




Re: [OMPI users] Release date for 1.3.4?

2009-11-12 Thread Douglas Guptill
Hello Eugene:

On Thu, Nov 12, 2009 at 07:20:08AM -0800, Eugene Loh wrote:
> Jeff Squyres wrote:
>
>> I think Eugene will have to answer this one -- Eugeue?
>>
>> On Nov 12, 2009, at 6:35 AM, John R. Cary wrote:
>>
>>> From http://svn.open-mpi.org/svn/ompi/branches/v1.3/NEWS I see:
>>>
>>> - Many updates and fixes to the (non-default) "sm" collective
>>>   component (i.e., native shared memory MPI collective operations).
>>>
>>> Will this fix the problem noted at
>>>
>>> https://svn.open-mpi.org/trac/ompi/ticket/2043
>>
> I've been devoting a lot of time to this one.  There seems (to me) to be  
> something very goofy going on here, but I've whittled it down a lot.  I  
> hope soon to have a simple demonstration of the problem so that it'll be  
> possible to decide if there is a problem with OMPI, GCC 4.4.0, both, or  
> something else.  So, I've made a lot of progress, but you're asking how  
> much longer until it's solved.  Different question.  I don't know.  This  
> is not a straightforward problem.

I love that answer.  Sincerely.  It should be taught in schools.  It
should be part of every programmer's toolkit.

Douglas.


[OMPI users] Come see us at SC09!

2009-11-12 Thread Jeff Squyres
Several of us from the Open MPI crew will be at SC09; if you're  
coming, be sure to stop by and say hello!  ...and use the SC09 Fist  
Bump(tm), of course (http://www.linux-mag.com/id/7608).


- I'll be hanging in and around the Cisco booth (#1847), but also  
giving various other booth talks around the floor.  If you miss me,  
leave your card with the people at the front of the Cisco booth.


- Dr. Edgar Gabriel from U. Houston, Josh Hursey from Indiana U., and  
Jens Doleschal from TU Dresden will be presenting Open MPI-related 15- 
minute talks in the Cisco booth.  Drop by the Cisco booth to see the  
exact schedule.


- George Bosilca and myself will be hosting the Open MPI BOF  
Wednesday, Nov 18th, at 12:15pm, in E145/E146.


- The Indiana University booth (#2573) will be holding a 2-hour  
collection of mini-seminars on Open MPI Thursday, Nov 19th, from 10am- 
noon; stop by for any 15-minute interval to learn fun and interesting  
things about Open MPI (see http://sc09.supercomputing.iu.edu/mini-conferences/lumsdaine) 
. 


- Don Kerr from the Sun Open MPI team will be in the Sun booth (#435).

- The MPI Forum will be holding a BOF Wednesday, Nov 18th, at 5:30pm,  
in D135/D136 to review MPI-3 efforts and request comments from the  
user community.  Many Forum members will be there for questions and  
comments as well.


- ...and likely other Open MPI members / developers whom I have  
forgotten to mention (forgive me!)


--
Jeff Squyres
jsquy...@cisco.com



[OMPI users] oob mca question

2009-11-12 Thread Aaron Knister

Dear List,

I'm having a really weird issue with openmpi - version 1.3.3 (version  
1.2.8 doesn't seem to exhibit this behavior). Essentially when I start  
jobs from the cluster front-end node using mpirun, mpirun sits idle  
for up to a minute and a half (for 30 nodes) before running the  
command I've given it. Running the same command on any other node in  
the cluster returns in a fraction of a second. Upon further research  
it appears its an issue with the way orted on the compute nodes are  
attempting to talk back to the front-end node. When I launch mpirun  
from the front-end node this is the process it spawns on the compute  
node (public ip scrambled for security purposes)-


orted --daemonize -mca ess env -mca orte_ess_jobid 1816657920 -mca  
orte_ess_vpid 1 -mca orte_ess_num_procs 3 --hnp-uri 1816657920.0;tcp:// 
130.X.X.X:56866;tcp://172.40.10.1:56866;tcp://172.20.10.1:56866


Throwing in some firewall debugging rules indicate that the compute  
nodes were trying to talk back to mpirun on the front-end node over  
the front-end node's public ip. Based on this, and looking at the  
arguments passed above it seemed as though the public ip of the front  
end node was being tried before any its private IPs, and the delay I  
was seeing was orted waiting for the connection to the front-end  
node's public ip to timeout before it tried it's cluster-facing ip and  
the connection succeeded.


I was able to work around this by specifying "--mca oob_tcp_if_include  
bond0,eth0" to mpirun (the front-end node has 2 bonded nics as its  
cluster facing interface). When I provided that argument the  
previously experienced delay disappeared. I could easily put that into  
openmpi-mca-params.conf and be done with the problem but I would like  
to know why openmpi chose to use the public ip of the node before it's  
internal IP and if this is expected behavior. I suspect that it may  
not be.


-Aaron


Re: [OMPI users] oob mca question

2009-11-12 Thread Ralph Castain
That is indeed the expected behavior, and your solution is the correct  
one.


The orted has no way of knowing which interface mpirun can be reached  
on, so it has no choice but to work its way through the available  
ones. Because of the ordering in the way the OS reports the  
interfaces, it is picking up the public one first - so that is the  
first one used.


Telling it the right one to use is the only solution.

On Nov 12, 2009, at 7:35 PM, Aaron Knister wrote:


Dear List,

I'm having a really weird issue with openmpi - version 1.3.3  
(version 1.2.8 doesn't seem to exhibit this behavior). Essentially  
when I start jobs from the cluster front-end node using mpirun,  
mpirun sits idle for up to a minute and a half (for 30 nodes) before  
running the command I've given it. Running the same command on any  
other node in the cluster returns in a fraction of a second. Upon  
further research it appears its an issue with the way orted on the  
compute nodes are attempting to talk back to the front-end node.  
When I launch mpirun from the front-end node this is the process it  
spawns on the compute node (public ip scrambled for security  
purposes)-


orted --daemonize -mca ess env -mca orte_ess_jobid 1816657920 -mca  
orte_ess_vpid 1 -mca orte_ess_num_procs 3 --hnp-uri  
1816657920.0;tcp://130.X.X.X:56866;tcp://172.40.10.1:56866;tcp:// 
172.20.10.1:56866


Throwing in some firewall debugging rules indicate that the compute  
nodes were trying to talk back to mpirun on the front-end node over  
the front-end node's public ip. Based on this, and looking at the  
arguments passed above it seemed as though the public ip of the  
front end node was being tried before any its private IPs, and the  
delay I was seeing was orted waiting for the connection to the front- 
end node's public ip to timeout before it tried it's cluster-facing  
ip and the connection succeeded.


I was able to work around this by specifying "--mca  
oob_tcp_if_include bond0,eth0" to mpirun (the front-end node has 2  
bonded nics as its cluster facing interface). When I provided that  
argument the previously experienced delay disappeared. I could  
easily put that into openmpi-mca-params.conf and be done with the  
problem but I would like to know why openmpi chose to use the public  
ip of the node before it's internal IP and if this is expected  
behavior. I suspect that it may not be.


-Aaron
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] oob mca question

2009-11-12 Thread Aaron Knister

Thanks! I appreciate the response.

On Nov 12, 2009, at 9:54 PM, Ralph Castain wrote:

That is indeed the expected behavior, and your solution is the  
correct one.


The orted has no way of knowing which interface mpirun can be  
reached on, so it has no choice but to work its way through the  
available ones. Because of the ordering in the way the OS reports  
the interfaces, it is picking up the public one first - so that is  
the first one used.


Telling it the right one to use is the only solution.

On Nov 12, 2009, at 7:35 PM, Aaron Knister wrote:


Dear List,

I'm having a really weird issue with openmpi - version 1.3.3  
(version 1.2.8 doesn't seem to exhibit this behavior). Essentially  
when I start jobs from the cluster front-end node using mpirun,  
mpirun sits idle for up to a minute and a half (for 30 nodes)  
before running the command I've given it. Running the same command  
on any other node in the cluster returns in a fraction of a second.  
Upon further research it appears its an issue with the way orted on  
the compute nodes are attempting to talk back to the front-end  
node. When I launch mpirun from the front-end node this is the  
process it spawns on the compute node (public ip scrambled for  
security purposes)-


orted --daemonize -mca ess env -mca orte_ess_jobid 1816657920 -mca  
orte_ess_vpid 1 -mca orte_ess_num_procs 3 --hnp-uri  
1816657920.0;tcp://130.X.X.X:56866;tcp://172.40.10.1:56866;tcp:// 
172.20.10.1:56866


Throwing in some firewall debugging rules indicate that the compute  
nodes were trying to talk back to mpirun on the front-end node over  
the front-end node's public ip. Based on this, and looking at the  
arguments passed above it seemed as though the public ip of the  
front end node was being tried before any its private IPs, and the  
delay I was seeing was orted waiting for the connection to the  
front-end node's public ip to timeout before it tried it's cluster- 
facing ip and the connection succeeded.


I was able to work around this by specifying "--mca  
oob_tcp_if_include bond0,eth0" to mpirun (the front-end node has 2  
bonded nics as its cluster facing interface). When I provided that  
argument the previously experienced delay disappeared. I could  
easily put that into openmpi-mca-params.conf and be done with the  
problem but I would like to know why openmpi chose to use the  
public ip of the node before it's internal IP and if this is  
expected behavior. I suspect that it may not be.


-Aaron
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users