Re: [OMPI users] Oversubscription performance problem

2008-04-13 Thread Jeff Squyres

Sorry for the delays in replying.

The central problem is that Open MPI is much more aggressive about its  
message passing progress than LAM is -- it simply wasn't designed to  
share well as a mechanism to get as high performance as possible.


mpi_yield_when_idle is most helpful only for certain transports that  
actively use our event engine, such as the TCP device.  Since you're  
using the LAM sysv RPI, I assume you're using the TCP and shared  
memory devices in OMPI, right?  If you're using infiniband, for  
example, the event engine is not called much because IB has its own  
progression engine that is unrelated to OMPI's (and therefore we don't  
invoke OMPI's much).


mpi_yield_when_idle is also only helpful if you're going into the MPI  
layer often and making message passing progress (i.e., OMPI's event  
engine is actively being invoked).  Is this true for your application?


If mpi_yield_when_idle really doesn't help much, you may consider  
sprinkling calls to sched_yield() in your codes to force the process  
to yield the processor.




On Apr 4, 2008, at 2:30 AM, Lars Andersson wrote:

Hi,

I'm just in the progress of moving our application from LAM/MPI to
OpenMPI, mainly because OpenMPI makes it easier for a user to run
multiple jobs(MPI universa) simultaneously. This is useful if a user
wants to run smaller experiments without disturbing a large experiment
running in the background). I've been evaluation the performance using
a simple test, running on a hetrogenous cluster of 2 x dual core
Opteron machines, a couple of dual core P4 Xeon machines and a 8 core
Core2 machine. The main structure of the application is a master rank
distributing jobs packages to the rest of the ranks and collecting the
results. We don't use any fancy MPI features but rather see it as an
efficient low-level tool for broadcasting and transferring data.

When a single user runs a job (fully subscribed nodes, but not
oversubscribed, i.e one process per cpu-core) on an otherwise unloaded
cluster both LAM/MPI and OpenMPI average runtimes of about 1m33s
(OpenMPI has a slightly lower average).

When I start the same job simultaneously as two different users (thus
oversubscribing the nodes 2x) under LAM/MPI, the two jobs finish as an
average time of about 3m, thus scaling very well (we use the -ssi rpi
sysv option to mpirun under LAM/MPI to avoid busy waiting).

When running the same second experiment under OpenMPI, the average
runtime jumps up to about 3m30s, with runs occasionally taking more
than 4 minutes to complete. I do use the "--mca mpi_yield_when_idle 1"
option to mpirun, but it doesn't seem to make any difference. I've
also tried setting the environment variable
OMPI_MCA_mpi_yield_when_idle=1, but still no change. ompi_info says:

ompi_info --param all all | grep yield
MCA mpi: parameter "mpi_yield_when_idle" (current  
value: "1")


The cluster is used for various tasks, running MPI applications as
well as non-MPI applications, so we would like to avoid spending too
much cycles on busy-waiting. Any ideas on how to tweak OpenMPI to get
better performance and more cooperative behavior in this case would be
greatly appreciated.

Cheers,

Lars
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Problems using Intel MKL with OpenMPI and Pathscale

2008-04-13 Thread Jeff Squyres
Do you get the same error if you disable the memory handling in Open  
MPI?  You can configure OMPI with:


--disable-memory-manager


On Apr 9, 2008, at 3:01 PM, Åke Sandgren wrote:

Hi!

I have an annoying problem that i hope someone here has some info on.

I'm trying to build a code with OpenMPI+Intel MKL+Pathscale.
When using the sequential (non-threaded) MKL everything is ok, but  
when

using the threaded MKL i get a segfault.

This doesn't happen when using MVAPICH so i suspect the memory  
handling

inside OpenMPI.

Version used are:
OpenMPI 1.2.6
Pathscale 3.2beta
MKL 10.0.2.018

Has anyone seen anything like this?

--
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90 7866126
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems




Re: [OMPI users] Troubles with MPI-IO Test and Torque/PVFS

2008-04-13 Thread Jeff Squyres
It looks like you're seg faulting when calling some flavor printf  
(perhaps vsnprintf?) in the make_error_messages() function.


You might want to double check the read_write_file() function to see  
exactly what kind of error it is encountering such that it is calling  
report_errs().



On Apr 10, 2008, at 3:29 PM, Davi Vercillo C. Garcia wrote:

Hi all,

I have a Cluster with Torque and PVFS. I'm trying to test my
environment with MPI-IO Test but some segfault are occurring.
Does anyone know what is happening ? The error output is below:

Rank 1 Host campogrande03.dcc.ufrj.br WARNING ERROR 1207853304: 1 bad
bytes at file offset 0.  Expected (null), received (null)
Rank 2 Host campogrande02.dcc.ufrj.br WARNING ERROR 1207853304: 1 bad
bytes at file offset 0.  Expected (null), received (null)
[campogrande01:10646] *** Process received signal ***
Rank 0 Host campogrande04.dcc.ufrj.br WARNING ERROR 1207853304: 1 bad
bytes at file offset 0.  Expected (null), received (null)
Rank 0 Host campogrande04.dcc.ufrj.br WARNING ERROR 1207853304: 65537
bad bytes at file offset 0.  Expected (null), received (null)
[campogrande04:05192] *** Process received signal ***
[campogrande04:05192] Signal: Segmentation fault (11)
[campogrande04:05192] Signal code: Address not mapped (1)
[campogrande04:05192] Failing at address: 0x1
Rank 1 Host campogrande03.dcc.ufrj.br WARNING ERROR 1207853304: 65537
bad bytes at file offset 0.  Expected (null), received (null)
[campogrande03:05377] *** Process received signal ***
[campogrande03:05377] Signal: Segmentation fault (11)
[campogrande03:05377] Signal code: Address not mapped (1)
[campogrande03:05377] Failing at address: 0x1
[campogrande03:05377] [ 0] [0xe440]
[campogrande03:05377] [ 1]
/lib/tls/i686/cmov/libc.so.6(vsnprintf+0xb4) [0xb7d5fef4]
[campogrande03:05377] [ 2] mpiIO_test(make_error_messages+0xcf)  
[0x80502e4]

[campogrande03:05377] [ 3] mpiIO_test(warning_msg+0x8c) [0x8050569]
[campogrande03:05377] [ 4] mpiIO_test(report_errs+0xe2) [0x804d413]
[campogrande03:05377] [ 5] mpiIO_test(read_write_file+0x594)  
[0x804d9c2]

[campogrande03:05377] [ 6] mpiIO_test(main+0x1d0) [0x804aa14]
[campogrande03:05377] [ 7]
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe0) [0xb7d15050]
[campogrande03:05377] [ 8] mpiIO_test [0x804a7e1]
[campogrande03:05377] *** End of error message ***
Rank 2 Host campogrande02.dcc.ufrj.br WARNING ERROR 1207853304: 65537
bad bytes at file offset 0.  Expected (null), received (null)
[campogrande02:05187] *** Process received signal ***
[campogrande02:05187] Signal: Segmentation fault (11)
[campogrande02:05187] Signal code: Address not mapped (1)
[campogrande02:05187] Failing at address: 0x1
[campogrande01:10646] Signal: Segmentation fault (11)
[campogrande01:10646] Signal code: Address not mapped (1)
[campogrande01:10646] Failing at address: 0x1a
[campogrande02:05187] [ 0] [0xe440]
[campogrande02:05187] [ 1]
/lib/tls/i686/cmov/libc.so.6(vsnprintf+0xb4) [0xb7d5fef4]
[campogrande02:05187] [ 2] mpiIO_test(make_error_messages+0xcf)  
[0x80502e4]

[campogrande02:05187] [ 3] mpiIO_test(warning_msg+0x8c) [0x8050569]
[campogrande02:05187] [ 4] mpiIO_test(report_errs+0xe2) [0x804d413]
[campogrande02:05187] [ 5] mpiIO_test(read_write_file+0x594)  
[0x804d9c2]

[campogrande02:05187] [ 6] mpiIO_test(main+0x1d0) [0x804aa14]
[campogrande02:05187] [ 7]
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe0) [0xb7d15050]
[campogrande02:05187] [ 8] mpiIO_test [0x804a7e1]
[campogrande02:05187] *** End of error message ***
[campogrande04:05192] [ 0] [0xe440]
[campogrande04:05192] [ 1]
/lib/tls/i686/cmov/libc.so.6(vsnprintf+0xb4) [0xb7d5fef4]
[campogrande04:05192] [ 2] mpiIO_test(make_error_messages+0xcf)  
[0x80502e4]

[campogrande04:05192] [ 3] mpiIO_test(warning_msg+0x8c) [0x8050569]
[campogrande04:05192] [ 4] mpiIO_test(report_errs+0xe2) [0x804d413]
[campogrande04:05192] [ 5] mpiIO_test(read_write_file+0x594)  
[0x804d9c2]

[campogrande04:05192] [ 6] mpiIO_test(main+0x1d0) [0x804aa14]
[campogrande04:05192] [ 7]
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe0) [0xb7d15050]
[campogrande04:05192] [ 8] mpiIO_test [0x804a7e1]
[campogrande04:05192] *** End of error message ***
[campogrande01:10646] [ 0] [0xe440]
[campogrande01:10646] [ 1]
/lib/tls/i686/cmov/libc.so.6(vsnprintf+0xb4) [0xb7d5fef4]
[campogrande01:10646] [ 2] mpiIO_test(make_error_messages+0xcf)  
[0x80502e4]

[campogrande01:10646] [ 3] mpiIO_test(warning_msg+0x8c) [0x8050569]
[campogrande01:10646] [ 4] mpiIO_test(report_errs+0xe2) [0x804d413]
[campogrande01:10646] [ 5] mpiIO_test(read_write_file+0x594)  
[0x804d9c2]

[campogrande01:10646] [ 6] mpiIO_test(main+0x1d0) [0x804aa14]
[campogrande01:10646] [ 7]
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe0) [0xb7d15050]
[campogrande01:10646] [ 8] mpiIO_test [0x804a7e1]
[campogrande01:10646] *** End of error message ***
mpiexec noticed that job rank 0 with PID 5192 on node campogrande04
exited on signal 11 (Segmentation fault).

--
Dav

Re: [OMPI users] Problems using Intel MKL with OpenMPI and Pathscale

2008-04-13 Thread Åke Sandgren
On Sun, 2008-04-13 at 08:00 -0400, Jeff Squyres wrote:
> Do you get the same error if you disable the memory handling in Open  
> MPI?  You can configure OMPI with:
> 
>  --disable-memory-manager

Ah, I have apparently missed that config flag, will try on monday.

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90 7866126
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se



Re: [OMPI users] problems with hostfile when doing MPMD

2008-04-13 Thread Ralph Castain
Hi Jody

Simple answer - the 1.2.x series does not support multiple hostfiles. I
believe you will find that documented in the FAQ section.

What you have to do here is have -one- hostfile that includes all the hosts,
and then -host each app-context to indicate which of those hosts are to be
used for that specific app-context.

Or you can just use -host for each app_context, with no hostfile specified.

If you specify a hostfile and -host, then -all- the nodes listed in your
-host's must be in the hostfile or we will error out.

This will change in 1.3 where we will support a separate hostfile for each
app_context, as well as one for the entire job, in combination with -host
args as well. You can see that documented on the open-mpi wiki.

Hope that helps
Ralph


On 4/10/08 4:40 AM, "jody"  wrote:

> HI
> In my network i have some 32 bit machines and some 64 bit machines.
> With --host i successfully call my application:
>   mpirun -np 3 --host aim-plankton -x DISPLAY ./run_gdb.sh ./MPITest :
> -np 3 --host aim-fanta4 -x DISPLAY ./run_gdb.sh ./MPITest64
> (MPITest64 has the same code as MPITest, but was compiled on the 64
> bit machine)
> 
> But when i use hostfiles:
>   mpirun -np 3 --hostfile hosts32 -x DISPLAY ./run_gdb.sh ./MPITest :
> -np 3 --hostfile hosts64 -x DISPLAY ./run_gdb.sh ./MPITest64
> all 6 processes are started on the 64 bit machine aim-fanta4.
> 
> hosts32:
>aim-plankton slots=3
> hosts64
>   aim-fanta4 slots
> 
> Is this a bug or a feature?  ;)
> 
> Jody
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] problems with hostfile when doing MPMD

2008-04-13 Thread Ralph Castain
I believe this -should- work, but can't verify it myself. The most important
thing is to be sure you built with --enable-heterogeneous or else it will
definitely fail.

Ralph



On 4/10/08 7:17 AM, "Rolf Vandevaart"  wrote:

> 
> On a CentOS Linux box, I see the following:
> 
>> grep 113 /usr/include/asm-i386/errno.h
> #define EHOSTUNREACH 113 /* No route to host */
> 
> I have also seen folks do this to figure out the errno.
> 
>> perl -e 'die$!=113'
> No route to host at -e line 1.
> 
> I am not sure why this is happening, but you could also check the Open
> MPI User's Mailing List Archives where there are other examples of
> people running into this error.  A search of "113" had a few hits.
> 
> http://www.open-mpi.org/community/lists/users
> 
> Also, I assume you would see this problem with or without the
> MPI_Barrier if you add this parameter to your mpirun line:
> 
>  --mca mpi_preconnect_all 1
> 
> The MPI_Barrier is causing the bad behavior because by default
> connections are setup up lazily. Therefore only when the MPI_Barrier
> call is made and we start communicating and establishing connections do
> we start seeing the communication problems.
> 
> Rolf
> 
> jody wrote:
>> Rolf,
>> I was able to run hostname on the two noes that way,
>> and also a simplified version of my testprogram (without a barrier)
>> works. Only MPI_Barrier shows bad behaviour.
>> 
>> Do you know what this message means?
>> [aim-plankton][0,1,2][btl_tcp_endpoint.c:
>> 572:mca_btl_tcp_endpoint_complete_connect]
>> connect() failed with errno=113
>> Does it give an idea what could be the problem?
>> 
>> Jody
>> 
>> On Thu, Apr 10, 2008 at 2:20 PM, Rolf Vandevaart
>>  wrote:
>>> This worked for me although I am not sure how extensive our 32/64
>>> interoperability support is.  I tested on Solaris using the TCP
>>> interconnect and a 1.2.5 version of Open MPI.  Also, we configure
>>> with
>>> the --enable-heterogeneous flag which may make a difference here.
>>> Also
>>> this did not work for me over the sm btl.
>>> 
>>> By the way, can you run a simple /bin/hostname across the two nodes?
>>> 
>>> 
>>>  burl-ct-v20z-4 61 =>/opt/SUNWhpc/HPC7.1/bin/mpicc -m32 simple.c -o
>>> simple.32
>>>  burl-ct-v20z-4 62 =>/opt/SUNWhpc/HPC7.1/bin/mpicc -m64 simple.c -o
>>> simple.64
>>>  burl-ct-v20z-4 63 =>/opt/SUNWhpc/HPC7.1/bin/mpirun -gmca
>>> btl_tcp_if_include bge1 -gmca btl sm,self,tcp -host burl-ct-v20z-4 -
>>> np 3
>>> simple.32 : -host burl-ct-v20z-5 -np 3 simple.64
>>> [burl-ct-v20z-4]I am #0/6 before the barrier
>>> [burl-ct-v20z-5]I am #3/6 before the barrier
>>> [burl-ct-v20z-5]I am #4/6 before the barrier
>>> [burl-ct-v20z-4]I am #1/6 before the barrier
>>> [burl-ct-v20z-4]I am #2/6 before the barrier
>>> [burl-ct-v20z-5]I am #5/6 before the barrier
>>> [burl-ct-v20z-5]I am #3/6 after the barrier
>>> [burl-ct-v20z-4]I am #1/6 after the barrier
>>> [burl-ct-v20z-5]I am #5/6 after the barrier
>>> [burl-ct-v20z-5]I am #4/6 after the barrier
>>> [burl-ct-v20z-4]I am #2/6 after the barrier
>>> [burl-ct-v20z-4]I am #0/6 after the barrier
>>>  burl-ct-v20z-4 64 =>/opt/SUNWhpc/HPC7.1/bin/mpirun -V mpirun (Open
>>> MPI) 1.2.5r16572
>>> 
>>> Report bugs to http://www.open-mpi.org/community/help/
>>>  burl-ct-v20z-4 65 =>
>>> 
>>> 
>>> 
>>> 
>>> jody wrote:
 i narrowed it down:
 The majority of processes get stuck in MPI_Barrier.
 My Test application looks like this:
 
 #include 
 #include 
 #include "mpi.h"
 
 int main(int iArgC, char *apArgV[]) {
int iResult = 0;
int iRank1;
int iNum1;
 
char sName[256];
gethostname(sName, 255);
 
MPI_Init(&iArgC, &apArgV);
 
MPI_Comm_rank(MPI_COMM_WORLD, &iRank1);
MPI_Comm_size(MPI_COMM_WORLD, &iNum1);
 
printf("[%s]I am #%d/%d before the barrier\n", sName, iRank1,
 iNum1);
MPI_Barrier(MPI_COMM_WORLD);
printf("[%s]I am #%d/%d after the barrier\n", sName, iRank1,
 iNum1);
 
MPI_Finalize();
 
return iResult;
 }
 
 
 If i make this call:
 mpirun -np 3 --debug-daemons --host aim-plankton -x DISPLAY
 ./run_gdb.sh ./MPITest32 : -np 3 --host aim-fanta4 -x DISPLAY
 ./run_gdb.sh ./MPITest64
 
 (run_gdb.sh is a script which starts gdb in a xterm for each
 process)
 Process 0 (on aim-plankton) passes the barrier and gets stuck in
 PMPI_Finalize,
 all other processes get stuck in PMPI_Barrier,
 Process 1 (on aim-plankton) displays the message
   [aim-plankton][0,1,1][btl_tcp_endpoint.c:
 572:mca_btl_tcp_endpoint_complete_connect]
 connect() failed with errno=113
 Process 2 on (aim-plankton) displays the same message twice.
 
 Any ideas?
 
  Thanks Jody
 
 On Thu, Apr 10, 2008 at 1:05 PM, jody  wrote:
> Hi
> Using a more realistic application than a simple "Hello, world"
> even the --host version doesn't work correctly
> Called this