Re: [OMPI users] Is it not possible to run a program with MPI code without mpirun/mpiexec?

2011-11-14 Thread Ralph Castain
Hmmm...it -should- work, but I've never tried it on Windows. I will verify it 
under Linux, but will have to defer to Shiqing to see if there is something 
particular about the Windows environment.


On Nov 13, 2011, at 8:13 PM, Naor Movshovitz wrote:

> I have open-mpi v1.5.4, installed from the binary installer for
> Windows. The following program test.c
> 
> #include 
> #include 
> int main(int argc, char *argv[])
> {
>  int rank, size;
>  MPI_Init(&argc,&argv);
>  MPI_Comm_size(MPI_COMM_WORLD,&size);
>  MPI_Comm_rank(MPI_COMM_WORLD,&rank);
>  printf("hellow world from rank %d of %d.\n",rank,size);
>  MPI_Finalize();
>  return 0;
> }
> 
> is compiled and linked without issue with
> 
> c:\temp\mpicc test.c
> 
> It also runs without issue with
> 
> c:\temp\mpirun test.exe
> 
> and prints the expected output. However, running the executable directly, as 
> in
> 
> c:\temp\test
> 
> prints the following and then hangs:
> 
> [COMPUTERNAME:03060] [[34061,0],0] ORTE_ERROR_LOG: Value out of
> bounds in file ../../../openmpi-1.5.4\orte\mca\oob\tcp\oob_tcp.c at
> line 1193
> 
> Is this a bug? I normally expect mpi programs to run without problem
> as a standalone executable. I should add that the mpi installation
> does not have any directories/files named in the error log, only
> pre-built binaries.
> 
> Thanks muchly,
> -nuun
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Is it not possible to run a program with MPI code without mpirun/mpiexec?

2011-11-14 Thread Shiqing Fan


I just found out that there were missing updates for Windows in 
singleton module (in trunk but not in 1.5 branch). I'll make a CMR for this.


On 2011-11-14 1:45 PM, Ralph Castain wrote:

Hmmm...it -should- work, but I've never tried it on Windows. I will verify it 
under Linux, but will have to defer to Shiqing to see if there is something 
particular about the Windows environment.


On Nov 13, 2011, at 8:13 PM, Naor Movshovitz wrote:


I have open-mpi v1.5.4, installed from the binary installer for
Windows. The following program test.c

#include
#include
int main(int argc, char *argv[])
{
  int rank, size;
  MPI_Init(&argc,&argv);
  MPI_Comm_size(MPI_COMM_WORLD,&size);
  MPI_Comm_rank(MPI_COMM_WORLD,&rank);
  printf("hellow world from rank %d of %d.\n",rank,size);
  MPI_Finalize();
  return 0;
}

is compiled and linked without issue with

c:\temp\mpicc test.c

It also runs without issue with

c:\temp\mpirun test.exe

and prints the expected output. However, running the executable directly, as in

c:\temp\test

prints the following and then hangs:

[COMPUTERNAME:03060] [[34061,0],0] ORTE_ERROR_LOG: Value out of
bounds in file ../../../openmpi-1.5.4\orte\mca\oob\tcp\oob_tcp.c at
line 1193

Is this a bug? I normally expect mpi programs to run without problem
as a standalone executable. I should add that the mpi installation
does not have any directories/files named in the error log, only
pre-built binaries.

Thanks muchly,
-nuun
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





--
---
Shiqing Fan
High Performance Computing Center Stuttgart (HLRS)
Tel: ++49(0)711-685-87234  Nobelstrasse 19
Fax: ++49(0)711-685-65832  70569 Stuttgart
http://www.hlrs.de/organization/people/shiqing-fan/
email: f...@hlrs.de



[OMPI users] OpenMPI 1.4.3 and PGI 11.8 segfault at run-time

2011-11-14 Thread Francesco Salvadore
Hello, 

I have problem in using OpenMPI 1.4.3 with PGI 11.8. A simple hello-world test 
program gives segfault and ompi_info gives segfault, sometimes, too. Using a 
debugger the problem seems to arise from libnuma

http://imageshack.us/photo/my-images/822/stacktracesegfaultpgi11.png/


I tried to avoid building of maffinity component specifying  
--enable-mca-no-build=maffinity,btl-portals but maffinity component seems to be 
installed anyway: segfault still arises and 

ompi_info |grep maffinity 

gives the result 

MCA maffinity: libnuma (MCA v2.0, API v2.0, Component v1.4.3).
I also tried to specify --without-libnuma, but I had no success. How can I 
force compilation to completely avoid the use of libnuma. Is there any better 
solution looking at the stacktrace above?

thanks,
Francesco


[OMPI users] Printing information on computing nodes.

2011-11-14 Thread Radomir Szewczyk
Hi,
The problem I'm facing now is how to print information on computing nodes.
E.g. I've got 10 real computers wired into one cluster with pelicanhpc.
I need each one of them to print results independently on their
screens. How To?
It may be an easy task, but I'm new to this and didn't find proper info.
Cheers
Radomir Szewczyk


Re: [OMPI users] Printing information on computing nodes.

2011-11-14 Thread Reuti
Hi,

Am 14.11.2011 um 19:54 schrieb Radomir Szewczyk:

> The problem I'm facing now is how to print information on computing nodes.
> E.g. I've got 10 real computers wired into one cluster with pelicanhpc.
> I need each one of them to print results independently on their
> screens. How To?

the stdout will be collected by the MPI library and all goes to the terminal 
where you started the mpiexec.

First you have to decide, what do you mean by "their screens". As MPI is 
started  by an SSH connection or alike, there is nothing where it can be output 
at the first place. They even maybe operated headless.

Otherwise: is there X11 running on all the nodes, or would it help to write 
something to the local virtual console like /dev/vcs7 or /dev/console in a text 
based session?

-- Reuti


> It may be an easy task, but I'm new to this and didn't find proper info.
> Cheers
> Radomir Szewczyk
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Printing information on computing nodes.

2011-11-14 Thread Radomir Szewczyk
So there is no solution? e.g. my 2 computers that are computing nodes
and are placed in different room on different floors. And the target
user wants to monitor the progress of computation independently which
have to be printed on their lcd monitors.

2011/11/14 Reuti :
> Hi,
>
> Am 14.11.2011 um 19:54 schrieb Radomir Szewczyk:
>
>> The problem I'm facing now is how to print information on computing nodes.
>> E.g. I've got 10 real computers wired into one cluster with pelicanhpc.
>> I need each one of them to print results independently on their
>> screens. How To?
>
> the stdout will be collected by the MPI library and all goes to the terminal 
> where you started the mpiexec.
>
> First you have to decide, what do you mean by "their screens". As MPI is 
> started  by an SSH connection or alike, there is nothing where it can be 
> output at the first place. They even maybe operated headless.
>
> Otherwise: is there X11 running on all the nodes, or would it help to write 
> something to the local virtual console like /dev/vcs7 or /dev/console in a 
> text based session?
>
> -- Reuti
>
>
>> It may be an easy task, but I'm new to this and didn't find proper info.
>> Cheers
>> Radomir Szewczyk
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



Re: [OMPI users] Printing information on computing nodes.

2011-11-14 Thread Ralph Castain

On Nov 14, 2011, at 12:18 PM, Radomir Szewczyk wrote:

> So there is no solution? e.g. my 2 computers that are computing nodes
> and are placed in different room on different floors. And the target
> user wants to monitor the progress of computation independently which
> have to be printed on their lcd monitors.

So...you want stdout/err to be repeated to multiple places? If so, then no - we 
don't support that, and I don't know anyone who does.


> 
> 2011/11/14 Reuti :
>> Hi,
>> 
>> Am 14.11.2011 um 19:54 schrieb Radomir Szewczyk:
>> 
>>> The problem I'm facing now is how to print information on computing nodes.
>>> E.g. I've got 10 real computers wired into one cluster with pelicanhpc.
>>> I need each one of them to print results independently on their
>>> screens. How To?
>> 
>> the stdout will be collected by the MPI library and all goes to the terminal 
>> where you started the mpiexec.
>> 
>> First you have to decide, what do you mean by "their screens". As MPI is 
>> started  by an SSH connection or alike, there is nothing where it can be 
>> output at the first place. They even maybe operated headless.
>> 
>> Otherwise: is there X11 running on all the nodes, or would it help to write 
>> something to the local virtual console like /dev/vcs7 or /dev/console in a 
>> text based session?
>> 
>> -- Reuti
>> 
>> 
>>> It may be an easy task, but I'm new to this and didn't find proper info.
>>> Cheers
>>> Radomir Szewczyk
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Printing information on computing nodes.

2011-11-14 Thread Radomir Szewczyk
lets say computing node no. 2 is dual core and uses 2 processes, it
prints out only the solution for lets say no 2 and 3 processes. kinda
if(id == 2 || id == 3) cout << "HW"; the rest ignores this
information. That's what I'm talking about. Thanks for your response.

2011/11/14 Ralph Castain :
>
> On Nov 14, 2011, at 12:18 PM, Radomir Szewczyk wrote:
>
>> So there is no solution? e.g. my 2 computers that are computing nodes
>> and are placed in different room on different floors. And the target
>> user wants to monitor the progress of computation independently which
>> have to be printed on their lcd monitors.
>
> So...you want stdout/err to be repeated to multiple places? If so, then no - 
> we don't support that, and I don't know anyone who does.
>
>
>>
>> 2011/11/14 Reuti :
>>> Hi,
>>>
>>> Am 14.11.2011 um 19:54 schrieb Radomir Szewczyk:
>>>
 The problem I'm facing now is how to print information on computing nodes.
 E.g. I've got 10 real computers wired into one cluster with pelicanhpc.
 I need each one of them to print results independently on their
 screens. How To?
>>>
>>> the stdout will be collected by the MPI library and all goes to the 
>>> terminal where you started the mpiexec.
>>>
>>> First you have to decide, what do you mean by "their screens". As MPI is 
>>> started  by an SSH connection or alike, there is nothing where it can be 
>>> output at the first place. They even maybe operated headless.
>>>
>>> Otherwise: is there X11 running on all the nodes, or would it help to write 
>>> something to the local virtual console like /dev/vcs7 or /dev/console in a 
>>> text based session?
>>>
>>> -- Reuti
>>>
>>>
 It may be an easy task, but I'm new to this and didn't find proper info.
 Cheers
 Radomir Szewczyk
 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



Re: [OMPI users] Printing information on computing nodes.

2011-11-14 Thread Ralph Castain

On Nov 14, 2011, at 12:28 PM, Radomir Szewczyk wrote:

> lets say computing node no. 2 is dual core and uses 2 processes, it
> prints out only the solution for lets say no 2 and 3 processes. kinda
> if(id == 2 || id == 3) cout << "HW"; the rest ignores this
> information. That's what I'm talking about. Thanks for your response.

I'm sorry - I honestly cannot understand what you are asking. Simply put, the 
output of ALL ranks is forwarded to mpirun, which prints the strings to its 
stdout/err. So whatever screen is running mpirun, that's where ALL the output 
from ALL ranks will appear.

If you look at "mpirun -h", you will see options for splitting the output by 
rank into files, tagging the output to make it readily apparent which rank it 
came from, etc. There is also an option for having each rank open an xterm 
window on the screen where mpirun resides and putting the output from that rank 
there.

However, there is NO option for redirecting the output from your MPI processes 
to anywhere other than the screen where mpirun is executing.


> 
> 2011/11/14 Ralph Castain :
>> 
>> On Nov 14, 2011, at 12:18 PM, Radomir Szewczyk wrote:
>> 
>>> So there is no solution? e.g. my 2 computers that are computing nodes
>>> and are placed in different room on different floors. And the target
>>> user wants to monitor the progress of computation independently which
>>> have to be printed on their lcd monitors.
>> 
>> So...you want stdout/err to be repeated to multiple places? If so, then no - 
>> we don't support that, and I don't know anyone who does.
>> 
>> 
>>> 
>>> 2011/11/14 Reuti :
 Hi,
 
 Am 14.11.2011 um 19:54 schrieb Radomir Szewczyk:
 
> The problem I'm facing now is how to print information on computing nodes.
> E.g. I've got 10 real computers wired into one cluster with pelicanhpc.
> I need each one of them to print results independently on their
> screens. How To?
 
 the stdout will be collected by the MPI library and all goes to the 
 terminal where you started the mpiexec.
 
 First you have to decide, what do you mean by "their screens". As MPI is 
 started  by an SSH connection or alike, there is nothing where it can be 
 output at the first place. They even maybe operated headless.
 
 Otherwise: is there X11 running on all the nodes, or would it help to 
 write something to the local virtual console like /dev/vcs7 or 
 /dev/console in a text based session?
 
 -- Reuti
 
 
> It may be an easy task, but I'm new to this and didn't find proper info.
> Cheers
> Radomir Szewczyk
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
 
 
 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users
 
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Printing information on computing nodes.

2011-11-14 Thread Reuti

Am 14.11.2011 um 20:37 schrieb Ralph Castain:

> 
> On Nov 14, 2011, at 12:28 PM, Radomir Szewczyk wrote:
> 
>> lets say computing node no. 2 is dual core and uses 2 processes, it
>> prints out only the solution for lets say no 2 and 3 processes. kinda
>> if(id == 2 || id == 3) cout << "HW"; the rest ignores this
>> information. That's what I'm talking about. Thanks for your response.
> 
> I'm sorry - I honestly cannot understand what you are asking. Simply put, the 
> output of ALL ranks is forwarded to mpirun, which prints the strings to its 
> stdout/err. So whatever screen is running mpirun, that's where ALL the output 
> from ALL ranks will appear.
> 
> If you look at "mpirun -h", you will see options for splitting the output by 
> rank into files, tagging the output to make it readily apparent which rank it 
> came from, etc. There is also an option for having each rank open an xterm 
> window on the screen where mpirun resides and putting the output from that 
> rank there.
> 
> However, there is NO option for redirecting the output from your MPI 
> processes to anywhere other than the screen where mpirun is executing.

What about writing to a local file (maybe a pipe) and the user has to tail this 
file on this particular machine?

-- Reuti


> 
>> 
>> 2011/11/14 Ralph Castain :
>>> 
>>> On Nov 14, 2011, at 12:18 PM, Radomir Szewczyk wrote:
>>> 
 So there is no solution? e.g. my 2 computers that are computing nodes
 and are placed in different room on different floors. And the target
 user wants to monitor the progress of computation independently which
 have to be printed on their lcd monitors.
>>> 
>>> So...you want stdout/err to be repeated to multiple places? If so, then no 
>>> - we don't support that, and I don't know anyone who does.
>>> 
>>> 
 
 2011/11/14 Reuti :
> Hi,
> 
> Am 14.11.2011 um 19:54 schrieb Radomir Szewczyk:
> 
>> The problem I'm facing now is how to print information on computing 
>> nodes.
>> E.g. I've got 10 real computers wired into one cluster with pelicanhpc.
>> I need each one of them to print results independently on their
>> screens. How To?
> 
> the stdout will be collected by the MPI library and all goes to the 
> terminal where you started the mpiexec.
> 
> First you have to decide, what do you mean by "their screens". As MPI is 
> started  by an SSH connection or alike, there is nothing where it can be 
> output at the first place. They even maybe operated headless.
> 
> Otherwise: is there X11 running on all the nodes, or would it help to 
> write something to the local virtual console like /dev/vcs7 or 
> /dev/console in a text based session?
> 
> -- Reuti
> 
> 
>> It may be an easy task, but I'm new to this and didn't find proper info.
>> Cheers
>> Radomir Szewczyk
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
 
 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] Program hangs in mpi_bcast

2011-11-14 Thread Tom Rosmond
Hello:

A colleague and I have been running a large F90 application that does an
enormous number of mpi_bcast calls during execution.  I deny any
responsibility for the design of the code and why it needs these calls,
but it is what we have inherited and have to work with.

Recently we ported the code to an 8 node, 6 processor/node NUMA system
(lstopo output attached) running Debian linux 6.0.3 with Open_MPI 1.5.3,
and began having trouble with mysterious 'hangs' in the program inside
the mpi_bcast calls.  The hangs were always in the same calls, but not
necessarily at the same time during integration.  We originally didn't
have NUMA support, so reinstalled with libnuma support added, but the
problem persisted.  Finally, just as a wild guess, we inserted
'mpi_barrier' calls just before the 'mpi_bcast' calls, and the program
now runs without problems.

I believe conventional wisdom is that properly formulated MPI programs
should run correctly without barriers, so do you have any thoughts on
why we found it necessary to add them?  The code has run correctly on
other architectures, i.g. Crayxe6, so I don't think there is a bug
anywhere.  My only explanation is that some internal resource gets
exhausted because of the large number of 'mpi_bcast' calls in rapid
succession, and the barrier calls force synchronization which allows the
resource to be restored.  Does this make sense?  I'd appreciate any
comments and advice you can provide.


I have attached compressed copies of config.log and ompi_info for the
system.  The program is built with ifort 12.0 and typically runs with 

  mpirun -np 36 -bycore -bind-to-core program.exe

We have run both interactively and with PBS, but that doesn't seem to
make any difference in program behavior.

T. Rosmond


Machine (128GB)
  Socket L#0 (32GB)
NUMANode L#0 (P#0 16GB) + L3 L#0 (5118KB)
  L2 L#0 (512KB) + L1 L#0 (64KB) + Core L#0 + PU L#0 (P#0)
  L2 L#1 (512KB) + L1 L#1 (64KB) + Core L#1 + PU L#1 (P#1)
  L2 L#2 (512KB) + L1 L#2 (64KB) + Core L#2 + PU L#2 (P#2)
  L2 L#3 (512KB) + L1 L#3 (64KB) + Core L#3 + PU L#3 (P#3)
  L2 L#4 (512KB) + L1 L#4 (64KB) + Core L#4 + PU L#4 (P#4)
  L2 L#5 (512KB) + L1 L#5 (64KB) + Core L#5 + PU L#5 (P#5)
NUMANode L#1 (P#1 16GB) + L3 L#1 (5118KB)
  L2 L#6 (512KB) + L1 L#6 (64KB) + Core L#6 + PU L#6 (P#6)
  L2 L#7 (512KB) + L1 L#7 (64KB) + Core L#7 + PU L#7 (P#7)
  L2 L#8 (512KB) + L1 L#8 (64KB) + Core L#8 + PU L#8 (P#8)
  L2 L#9 (512KB) + L1 L#9 (64KB) + Core L#9 + PU L#9 (P#9)
  L2 L#10 (512KB) + L1 L#10 (64KB) + Core L#10 + PU L#10 (P#10)
  L2 L#11 (512KB) + L1 L#11 (64KB) + Core L#11 + PU L#11 (P#11)
  Socket L#1 (32GB)
NUMANode L#2 (P#2 16GB) + L3 L#2 (5118KB)
  L2 L#12 (512KB) + L1 L#12 (64KB) + Core L#12 + PU L#12 (P#12)
  L2 L#13 (512KB) + L1 L#13 (64KB) + Core L#13 + PU L#13 (P#13)
  L2 L#14 (512KB) + L1 L#14 (64KB) + Core L#14 + PU L#14 (P#14)
  L2 L#15 (512KB) + L1 L#15 (64KB) + Core L#15 + PU L#15 (P#15)
  L2 L#16 (512KB) + L1 L#16 (64KB) + Core L#16 + PU L#16 (P#16)
  L2 L#17 (512KB) + L1 L#17 (64KB) + Core L#17 + PU L#17 (P#17)
NUMANode L#3 (P#3 16GB) + L3 L#3 (5118KB)
  L2 L#18 (512KB) + L1 L#18 (64KB) + Core L#18 + PU L#18 (P#18)
  L2 L#19 (512KB) + L1 L#19 (64KB) + Core L#19 + PU L#19 (P#19)
  L2 L#20 (512KB) + L1 L#20 (64KB) + Core L#20 + PU L#20 (P#20)
  L2 L#21 (512KB) + L1 L#21 (64KB) + Core L#21 + PU L#21 (P#21)
  L2 L#22 (512KB) + L1 L#22 (64KB) + Core L#22 + PU L#22 (P#22)
  L2 L#23 (512KB) + L1 L#23 (64KB) + Core L#23 + PU L#23 (P#23)
  Socket L#2 (32GB)
NUMANode L#4 (P#4 16GB) + L3 L#4 (5118KB)
  L2 L#24 (512KB) + L1 L#24 (64KB) + Core L#24 + PU L#24 (P#24)
  L2 L#25 (512KB) + L1 L#25 (64KB) + Core L#25 + PU L#25 (P#25)
  L2 L#26 (512KB) + L1 L#26 (64KB) + Core L#26 + PU L#26 (P#26)
  L2 L#27 (512KB) + L1 L#27 (64KB) + Core L#27 + PU L#27 (P#27)
  L2 L#28 (512KB) + L1 L#28 (64KB) + Core L#28 + PU L#28 (P#28)
  L2 L#29 (512KB) + L1 L#29 (64KB) + Core L#29 + PU L#29 (P#29)
NUMANode L#5 (P#5 16GB) + L3 L#5 (5118KB)
  L2 L#30 (512KB) + L1 L#30 (64KB) + Core L#30 + PU L#30 (P#30)
  L2 L#31 (512KB) + L1 L#31 (64KB) + Core L#31 + PU L#31 (P#31)
  L2 L#32 (512KB) + L1 L#32 (64KB) + Core L#32 + PU L#32 (P#32)
  L2 L#33 (512KB) + L1 L#33 (64KB) + Core L#33 + PU L#33 (P#33)
  L2 L#34 (512KB) + L1 L#34 (64KB) + Core L#34 + PU L#34 (P#34)
  L2 L#35 (512KB) + L1 L#35 (64KB) + Core L#35 + PU L#35 (P#35)
  Socket L#3 (32GB)
NUMANode L#6 (P#6 16GB) + L3 L#6 (5118KB)
  L2 L#36 (512KB) + L1 L#36 (64KB) + Core L#36 + PU L#36 (P#36)
  L2 L#37 (512KB) + L1 L#37 (64KB) + Core L#37 + PU L#37 (P#37)
  L2 L#38 (512KB) + L1 L#38 (64KB) + Core L#38 + PU L#38 (P#38)
  L2 L#39 (512KB) + L1 L#39 (64KB) + Core L#39 + PU L#39 (P#39)
  L2 L#40 (512KB) + L1 L#40 (64KB) + Core L#40 + PU L#40 (P#40)
  L2 L#41 (512KB) + L1 L#41 (64KB) + Core L#41 + PU L#41 (P#41)

[OMPI users] MPI_MAX_PORT_NAME different in C and Fortran headers

2011-11-14 Thread Enzo Dari
I'm trying to establish communications between two mpi processes using
MPI_Open_port / MPI_Publish_name / MPI_Comm_accept
in a server and
MPI_Lookup_name / MPI_Comm_connect
in a client.
The source code is in fortran, and the client fails with some sort of
"malloc error".
It seems that the different values for the MPI_MAX_PORT_NAME constants
between C (1024) and Fortran (255) is the reason for the problem.
Declaring the Port_name variable in Fortran with size 1023 solves this
problem, buy I'm not sure if this is the proper way to handle this issue,
and I'm not aware of the possible side-effects of changing
MPI_MAX_PORT_NAME in .../include/mpi/mpif-common.h
I'm using openmpi 1.4.2 (included in debian stable: 6.0.3) with gfortran
4.4.5 (also the version included in debian stable). Also tried with openmpi
1.4.4 and ifort 11.1
-- 
Enzo A. Dari
Instituto Balseiro / Centro Atomico Bariloche


Re: [OMPI users] Program hangs in mpi_bcast

2011-11-14 Thread Ralph Castain
Yes, this is well documented - may be on the FAQ, but certainly has been in the 
user list multiple times.

The problem is that one process falls behind, which causes it to begin 
accumulating "unexpected messages" in its queue. This causes the matching logic 
to run a little slower, thus making the process fall further and further 
behind. Eventually, things hang because everyone is sitting in bcast waiting 
for the slow proc to catch up, but it's queue is saturated and it can't.

The solution is to do exactly what you describe - add some barriers to force 
the slow process to catch up. This happened enough that we even added support 
for it in OMPI itself so you don't have to modify your code. Look at the 
following from "ompi_info --param coll sync"

MCA coll: parameter "coll_base_verbose" (current value: <0>, 
data source: default value)
  Verbosity level for the coll framework (0 = no 
verbosity)
MCA coll: parameter "coll_sync_priority" (current value: <50>, 
data source: default value)
  Priority of the sync coll component; only relevant if 
barrier_before or barrier_after is > 0
   MCA coll: parameter "coll_sync_barrier_before" (current value: 
<1000>, data source: default value)
  Do a synchronization before each Nth collective
MCA coll: parameter "coll_sync_barrier_after" (current value: 
<0>, data source: default value)
  Do a synchronization after each Nth collective

Take your pick - inserting a barrier before or after doesn't seem to make a lot 
of difference, but most people use "before". Try different values until you get 
something that works for you.


On Nov 14, 2011, at 3:10 PM, Tom Rosmond wrote:

> Hello:
> 
> A colleague and I have been running a large F90 application that does an
> enormous number of mpi_bcast calls during execution.  I deny any
> responsibility for the design of the code and why it needs these calls,
> but it is what we have inherited and have to work with.
> 
> Recently we ported the code to an 8 node, 6 processor/node NUMA system
> (lstopo output attached) running Debian linux 6.0.3 with Open_MPI 1.5.3,
> and began having trouble with mysterious 'hangs' in the program inside
> the mpi_bcast calls.  The hangs were always in the same calls, but not
> necessarily at the same time during integration.  We originally didn't
> have NUMA support, so reinstalled with libnuma support added, but the
> problem persisted.  Finally, just as a wild guess, we inserted
> 'mpi_barrier' calls just before the 'mpi_bcast' calls, and the program
> now runs without problems.
> 
> I believe conventional wisdom is that properly formulated MPI programs
> should run correctly without barriers, so do you have any thoughts on
> why we found it necessary to add them?  The code has run correctly on
> other architectures, i.g. Crayxe6, so I don't think there is a bug
> anywhere.  My only explanation is that some internal resource gets
> exhausted because of the large number of 'mpi_bcast' calls in rapid
> succession, and the barrier calls force synchronization which allows the
> resource to be restored.  Does this make sense?  I'd appreciate any
> comments and advice you can provide.
> 
> 
> I have attached compressed copies of config.log and ompi_info for the
> system.  The program is built with ifort 12.0 and typically runs with 
> 
>  mpirun -np 36 -bycore -bind-to-core program.exe
> 
> We have run both interactively and with PBS, but that doesn't seem to
> make any difference in program behavior.
> 
> T. Rosmond
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users