Re: [OMPI users] Is it not possible to run a program with MPI code without mpirun/mpiexec?
Hmmm...it -should- work, but I've never tried it on Windows. I will verify it under Linux, but will have to defer to Shiqing to see if there is something particular about the Windows environment. On Nov 13, 2011, at 8:13 PM, Naor Movshovitz wrote: > I have open-mpi v1.5.4, installed from the binary installer for > Windows. The following program test.c > > #include > #include > int main(int argc, char *argv[]) > { > int rank, size; > MPI_Init(&argc,&argv); > MPI_Comm_size(MPI_COMM_WORLD,&size); > MPI_Comm_rank(MPI_COMM_WORLD,&rank); > printf("hellow world from rank %d of %d.\n",rank,size); > MPI_Finalize(); > return 0; > } > > is compiled and linked without issue with > > c:\temp\mpicc test.c > > It also runs without issue with > > c:\temp\mpirun test.exe > > and prints the expected output. However, running the executable directly, as > in > > c:\temp\test > > prints the following and then hangs: > > [COMPUTERNAME:03060] [[34061,0],0] ORTE_ERROR_LOG: Value out of > bounds in file ../../../openmpi-1.5.4\orte\mca\oob\tcp\oob_tcp.c at > line 1193 > > Is this a bug? I normally expect mpi programs to run without problem > as a standalone executable. I should add that the mpi installation > does not have any directories/files named in the error log, only > pre-built binaries. > > Thanks muchly, > -nuun > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Is it not possible to run a program with MPI code without mpirun/mpiexec?
I just found out that there were missing updates for Windows in singleton module (in trunk but not in 1.5 branch). I'll make a CMR for this. On 2011-11-14 1:45 PM, Ralph Castain wrote: Hmmm...it -should- work, but I've never tried it on Windows. I will verify it under Linux, but will have to defer to Shiqing to see if there is something particular about the Windows environment. On Nov 13, 2011, at 8:13 PM, Naor Movshovitz wrote: I have open-mpi v1.5.4, installed from the binary installer for Windows. The following program test.c #include #include int main(int argc, char *argv[]) { int rank, size; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&size); MPI_Comm_rank(MPI_COMM_WORLD,&rank); printf("hellow world from rank %d of %d.\n",rank,size); MPI_Finalize(); return 0; } is compiled and linked without issue with c:\temp\mpicc test.c It also runs without issue with c:\temp\mpirun test.exe and prints the expected output. However, running the executable directly, as in c:\temp\test prints the following and then hangs: [COMPUTERNAME:03060] [[34061,0],0] ORTE_ERROR_LOG: Value out of bounds in file ../../../openmpi-1.5.4\orte\mca\oob\tcp\oob_tcp.c at line 1193 Is this a bug? I normally expect mpi programs to run without problem as a standalone executable. I should add that the mpi installation does not have any directories/files named in the error log, only pre-built binaries. Thanks muchly, -nuun ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- --- Shiqing Fan High Performance Computing Center Stuttgart (HLRS) Tel: ++49(0)711-685-87234 Nobelstrasse 19 Fax: ++49(0)711-685-65832 70569 Stuttgart http://www.hlrs.de/organization/people/shiqing-fan/ email: f...@hlrs.de
[OMPI users] OpenMPI 1.4.3 and PGI 11.8 segfault at run-time
Hello, I have problem in using OpenMPI 1.4.3 with PGI 11.8. A simple hello-world test program gives segfault and ompi_info gives segfault, sometimes, too. Using a debugger the problem seems to arise from libnuma http://imageshack.us/photo/my-images/822/stacktracesegfaultpgi11.png/ I tried to avoid building of maffinity component specifying --enable-mca-no-build=maffinity,btl-portals but maffinity component seems to be installed anyway: segfault still arises and ompi_info |grep maffinity gives the result MCA maffinity: libnuma (MCA v2.0, API v2.0, Component v1.4.3). I also tried to specify --without-libnuma, but I had no success. How can I force compilation to completely avoid the use of libnuma. Is there any better solution looking at the stacktrace above? thanks, Francesco
[OMPI users] Printing information on computing nodes.
Hi, The problem I'm facing now is how to print information on computing nodes. E.g. I've got 10 real computers wired into one cluster with pelicanhpc. I need each one of them to print results independently on their screens. How To? It may be an easy task, but I'm new to this and didn't find proper info. Cheers Radomir Szewczyk
Re: [OMPI users] Printing information on computing nodes.
Hi, Am 14.11.2011 um 19:54 schrieb Radomir Szewczyk: > The problem I'm facing now is how to print information on computing nodes. > E.g. I've got 10 real computers wired into one cluster with pelicanhpc. > I need each one of them to print results independently on their > screens. How To? the stdout will be collected by the MPI library and all goes to the terminal where you started the mpiexec. First you have to decide, what do you mean by "their screens". As MPI is started by an SSH connection or alike, there is nothing where it can be output at the first place. They even maybe operated headless. Otherwise: is there X11 running on all the nodes, or would it help to write something to the local virtual console like /dev/vcs7 or /dev/console in a text based session? -- Reuti > It may be an easy task, but I'm new to this and didn't find proper info. > Cheers > Radomir Szewczyk > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Printing information on computing nodes.
So there is no solution? e.g. my 2 computers that are computing nodes and are placed in different room on different floors. And the target user wants to monitor the progress of computation independently which have to be printed on their lcd monitors. 2011/11/14 Reuti : > Hi, > > Am 14.11.2011 um 19:54 schrieb Radomir Szewczyk: > >> The problem I'm facing now is how to print information on computing nodes. >> E.g. I've got 10 real computers wired into one cluster with pelicanhpc. >> I need each one of them to print results independently on their >> screens. How To? > > the stdout will be collected by the MPI library and all goes to the terminal > where you started the mpiexec. > > First you have to decide, what do you mean by "their screens". As MPI is > started by an SSH connection or alike, there is nothing where it can be > output at the first place. They even maybe operated headless. > > Otherwise: is there X11 running on all the nodes, or would it help to write > something to the local virtual console like /dev/vcs7 or /dev/console in a > text based session? > > -- Reuti > > >> It may be an easy task, but I'm new to this and didn't find proper info. >> Cheers >> Radomir Szewczyk >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Printing information on computing nodes.
On Nov 14, 2011, at 12:18 PM, Radomir Szewczyk wrote: > So there is no solution? e.g. my 2 computers that are computing nodes > and are placed in different room on different floors. And the target > user wants to monitor the progress of computation independently which > have to be printed on their lcd monitors. So...you want stdout/err to be repeated to multiple places? If so, then no - we don't support that, and I don't know anyone who does. > > 2011/11/14 Reuti : >> Hi, >> >> Am 14.11.2011 um 19:54 schrieb Radomir Szewczyk: >> >>> The problem I'm facing now is how to print information on computing nodes. >>> E.g. I've got 10 real computers wired into one cluster with pelicanhpc. >>> I need each one of them to print results independently on their >>> screens. How To? >> >> the stdout will be collected by the MPI library and all goes to the terminal >> where you started the mpiexec. >> >> First you have to decide, what do you mean by "their screens". As MPI is >> started by an SSH connection or alike, there is nothing where it can be >> output at the first place. They even maybe operated headless. >> >> Otherwise: is there X11 running on all the nodes, or would it help to write >> something to the local virtual console like /dev/vcs7 or /dev/console in a >> text based session? >> >> -- Reuti >> >> >>> It may be an easy task, but I'm new to this and didn't find proper info. >>> Cheers >>> Radomir Szewczyk >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Printing information on computing nodes.
lets say computing node no. 2 is dual core and uses 2 processes, it prints out only the solution for lets say no 2 and 3 processes. kinda if(id == 2 || id == 3) cout << "HW"; the rest ignores this information. That's what I'm talking about. Thanks for your response. 2011/11/14 Ralph Castain : > > On Nov 14, 2011, at 12:18 PM, Radomir Szewczyk wrote: > >> So there is no solution? e.g. my 2 computers that are computing nodes >> and are placed in different room on different floors. And the target >> user wants to monitor the progress of computation independently which >> have to be printed on their lcd monitors. > > So...you want stdout/err to be repeated to multiple places? If so, then no - > we don't support that, and I don't know anyone who does. > > >> >> 2011/11/14 Reuti : >>> Hi, >>> >>> Am 14.11.2011 um 19:54 schrieb Radomir Szewczyk: >>> The problem I'm facing now is how to print information on computing nodes. E.g. I've got 10 real computers wired into one cluster with pelicanhpc. I need each one of them to print results independently on their screens. How To? >>> >>> the stdout will be collected by the MPI library and all goes to the >>> terminal where you started the mpiexec. >>> >>> First you have to decide, what do you mean by "their screens". As MPI is >>> started by an SSH connection or alike, there is nothing where it can be >>> output at the first place. They even maybe operated headless. >>> >>> Otherwise: is there X11 running on all the nodes, or would it help to write >>> something to the local virtual console like /dev/vcs7 or /dev/console in a >>> text based session? >>> >>> -- Reuti >>> >>> It may be an easy task, but I'm new to this and didn't find proper info. Cheers Radomir Szewczyk ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Printing information on computing nodes.
On Nov 14, 2011, at 12:28 PM, Radomir Szewczyk wrote: > lets say computing node no. 2 is dual core and uses 2 processes, it > prints out only the solution for lets say no 2 and 3 processes. kinda > if(id == 2 || id == 3) cout << "HW"; the rest ignores this > information. That's what I'm talking about. Thanks for your response. I'm sorry - I honestly cannot understand what you are asking. Simply put, the output of ALL ranks is forwarded to mpirun, which prints the strings to its stdout/err. So whatever screen is running mpirun, that's where ALL the output from ALL ranks will appear. If you look at "mpirun -h", you will see options for splitting the output by rank into files, tagging the output to make it readily apparent which rank it came from, etc. There is also an option for having each rank open an xterm window on the screen where mpirun resides and putting the output from that rank there. However, there is NO option for redirecting the output from your MPI processes to anywhere other than the screen where mpirun is executing. > > 2011/11/14 Ralph Castain : >> >> On Nov 14, 2011, at 12:18 PM, Radomir Szewczyk wrote: >> >>> So there is no solution? e.g. my 2 computers that are computing nodes >>> and are placed in different room on different floors. And the target >>> user wants to monitor the progress of computation independently which >>> have to be printed on their lcd monitors. >> >> So...you want stdout/err to be repeated to multiple places? If so, then no - >> we don't support that, and I don't know anyone who does. >> >> >>> >>> 2011/11/14 Reuti : Hi, Am 14.11.2011 um 19:54 schrieb Radomir Szewczyk: > The problem I'm facing now is how to print information on computing nodes. > E.g. I've got 10 real computers wired into one cluster with pelicanhpc. > I need each one of them to print results independently on their > screens. How To? the stdout will be collected by the MPI library and all goes to the terminal where you started the mpiexec. First you have to decide, what do you mean by "their screens". As MPI is started by an SSH connection or alike, there is nothing where it can be output at the first place. They even maybe operated headless. Otherwise: is there X11 running on all the nodes, or would it help to write something to the local virtual console like /dev/vcs7 or /dev/console in a text based session? -- Reuti > It may be an easy task, but I'm new to this and didn't find proper info. > Cheers > Radomir Szewczyk > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Printing information on computing nodes.
Am 14.11.2011 um 20:37 schrieb Ralph Castain: > > On Nov 14, 2011, at 12:28 PM, Radomir Szewczyk wrote: > >> lets say computing node no. 2 is dual core and uses 2 processes, it >> prints out only the solution for lets say no 2 and 3 processes. kinda >> if(id == 2 || id == 3) cout << "HW"; the rest ignores this >> information. That's what I'm talking about. Thanks for your response. > > I'm sorry - I honestly cannot understand what you are asking. Simply put, the > output of ALL ranks is forwarded to mpirun, which prints the strings to its > stdout/err. So whatever screen is running mpirun, that's where ALL the output > from ALL ranks will appear. > > If you look at "mpirun -h", you will see options for splitting the output by > rank into files, tagging the output to make it readily apparent which rank it > came from, etc. There is also an option for having each rank open an xterm > window on the screen where mpirun resides and putting the output from that > rank there. > > However, there is NO option for redirecting the output from your MPI > processes to anywhere other than the screen where mpirun is executing. What about writing to a local file (maybe a pipe) and the user has to tail this file on this particular machine? -- Reuti > >> >> 2011/11/14 Ralph Castain : >>> >>> On Nov 14, 2011, at 12:18 PM, Radomir Szewczyk wrote: >>> So there is no solution? e.g. my 2 computers that are computing nodes and are placed in different room on different floors. And the target user wants to monitor the progress of computation independently which have to be printed on their lcd monitors. >>> >>> So...you want stdout/err to be repeated to multiple places? If so, then no >>> - we don't support that, and I don't know anyone who does. >>> >>> 2011/11/14 Reuti : > Hi, > > Am 14.11.2011 um 19:54 schrieb Radomir Szewczyk: > >> The problem I'm facing now is how to print information on computing >> nodes. >> E.g. I've got 10 real computers wired into one cluster with pelicanhpc. >> I need each one of them to print results independently on their >> screens. How To? > > the stdout will be collected by the MPI library and all goes to the > terminal where you started the mpiexec. > > First you have to decide, what do you mean by "their screens". As MPI is > started by an SSH connection or alike, there is nothing where it can be > output at the first place. They even maybe operated headless. > > Otherwise: is there X11 running on all the nodes, or would it help to > write something to the local virtual console like /dev/vcs7 or > /dev/console in a text based session? > > -- Reuti > > >> It may be an easy task, but I'm new to this and didn't find proper info. >> Cheers >> Radomir Szewczyk >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Program hangs in mpi_bcast
Hello: A colleague and I have been running a large F90 application that does an enormous number of mpi_bcast calls during execution. I deny any responsibility for the design of the code and why it needs these calls, but it is what we have inherited and have to work with. Recently we ported the code to an 8 node, 6 processor/node NUMA system (lstopo output attached) running Debian linux 6.0.3 with Open_MPI 1.5.3, and began having trouble with mysterious 'hangs' in the program inside the mpi_bcast calls. The hangs were always in the same calls, but not necessarily at the same time during integration. We originally didn't have NUMA support, so reinstalled with libnuma support added, but the problem persisted. Finally, just as a wild guess, we inserted 'mpi_barrier' calls just before the 'mpi_bcast' calls, and the program now runs without problems. I believe conventional wisdom is that properly formulated MPI programs should run correctly without barriers, so do you have any thoughts on why we found it necessary to add them? The code has run correctly on other architectures, i.g. Crayxe6, so I don't think there is a bug anywhere. My only explanation is that some internal resource gets exhausted because of the large number of 'mpi_bcast' calls in rapid succession, and the barrier calls force synchronization which allows the resource to be restored. Does this make sense? I'd appreciate any comments and advice you can provide. I have attached compressed copies of config.log and ompi_info for the system. The program is built with ifort 12.0 and typically runs with mpirun -np 36 -bycore -bind-to-core program.exe We have run both interactively and with PBS, but that doesn't seem to make any difference in program behavior. T. Rosmond Machine (128GB) Socket L#0 (32GB) NUMANode L#0 (P#0 16GB) + L3 L#0 (5118KB) L2 L#0 (512KB) + L1 L#0 (64KB) + Core L#0 + PU L#0 (P#0) L2 L#1 (512KB) + L1 L#1 (64KB) + Core L#1 + PU L#1 (P#1) L2 L#2 (512KB) + L1 L#2 (64KB) + Core L#2 + PU L#2 (P#2) L2 L#3 (512KB) + L1 L#3 (64KB) + Core L#3 + PU L#3 (P#3) L2 L#4 (512KB) + L1 L#4 (64KB) + Core L#4 + PU L#4 (P#4) L2 L#5 (512KB) + L1 L#5 (64KB) + Core L#5 + PU L#5 (P#5) NUMANode L#1 (P#1 16GB) + L3 L#1 (5118KB) L2 L#6 (512KB) + L1 L#6 (64KB) + Core L#6 + PU L#6 (P#6) L2 L#7 (512KB) + L1 L#7 (64KB) + Core L#7 + PU L#7 (P#7) L2 L#8 (512KB) + L1 L#8 (64KB) + Core L#8 + PU L#8 (P#8) L2 L#9 (512KB) + L1 L#9 (64KB) + Core L#9 + PU L#9 (P#9) L2 L#10 (512KB) + L1 L#10 (64KB) + Core L#10 + PU L#10 (P#10) L2 L#11 (512KB) + L1 L#11 (64KB) + Core L#11 + PU L#11 (P#11) Socket L#1 (32GB) NUMANode L#2 (P#2 16GB) + L3 L#2 (5118KB) L2 L#12 (512KB) + L1 L#12 (64KB) + Core L#12 + PU L#12 (P#12) L2 L#13 (512KB) + L1 L#13 (64KB) + Core L#13 + PU L#13 (P#13) L2 L#14 (512KB) + L1 L#14 (64KB) + Core L#14 + PU L#14 (P#14) L2 L#15 (512KB) + L1 L#15 (64KB) + Core L#15 + PU L#15 (P#15) L2 L#16 (512KB) + L1 L#16 (64KB) + Core L#16 + PU L#16 (P#16) L2 L#17 (512KB) + L1 L#17 (64KB) + Core L#17 + PU L#17 (P#17) NUMANode L#3 (P#3 16GB) + L3 L#3 (5118KB) L2 L#18 (512KB) + L1 L#18 (64KB) + Core L#18 + PU L#18 (P#18) L2 L#19 (512KB) + L1 L#19 (64KB) + Core L#19 + PU L#19 (P#19) L2 L#20 (512KB) + L1 L#20 (64KB) + Core L#20 + PU L#20 (P#20) L2 L#21 (512KB) + L1 L#21 (64KB) + Core L#21 + PU L#21 (P#21) L2 L#22 (512KB) + L1 L#22 (64KB) + Core L#22 + PU L#22 (P#22) L2 L#23 (512KB) + L1 L#23 (64KB) + Core L#23 + PU L#23 (P#23) Socket L#2 (32GB) NUMANode L#4 (P#4 16GB) + L3 L#4 (5118KB) L2 L#24 (512KB) + L1 L#24 (64KB) + Core L#24 + PU L#24 (P#24) L2 L#25 (512KB) + L1 L#25 (64KB) + Core L#25 + PU L#25 (P#25) L2 L#26 (512KB) + L1 L#26 (64KB) + Core L#26 + PU L#26 (P#26) L2 L#27 (512KB) + L1 L#27 (64KB) + Core L#27 + PU L#27 (P#27) L2 L#28 (512KB) + L1 L#28 (64KB) + Core L#28 + PU L#28 (P#28) L2 L#29 (512KB) + L1 L#29 (64KB) + Core L#29 + PU L#29 (P#29) NUMANode L#5 (P#5 16GB) + L3 L#5 (5118KB) L2 L#30 (512KB) + L1 L#30 (64KB) + Core L#30 + PU L#30 (P#30) L2 L#31 (512KB) + L1 L#31 (64KB) + Core L#31 + PU L#31 (P#31) L2 L#32 (512KB) + L1 L#32 (64KB) + Core L#32 + PU L#32 (P#32) L2 L#33 (512KB) + L1 L#33 (64KB) + Core L#33 + PU L#33 (P#33) L2 L#34 (512KB) + L1 L#34 (64KB) + Core L#34 + PU L#34 (P#34) L2 L#35 (512KB) + L1 L#35 (64KB) + Core L#35 + PU L#35 (P#35) Socket L#3 (32GB) NUMANode L#6 (P#6 16GB) + L3 L#6 (5118KB) L2 L#36 (512KB) + L1 L#36 (64KB) + Core L#36 + PU L#36 (P#36) L2 L#37 (512KB) + L1 L#37 (64KB) + Core L#37 + PU L#37 (P#37) L2 L#38 (512KB) + L1 L#38 (64KB) + Core L#38 + PU L#38 (P#38) L2 L#39 (512KB) + L1 L#39 (64KB) + Core L#39 + PU L#39 (P#39) L2 L#40 (512KB) + L1 L#40 (64KB) + Core L#40 + PU L#40 (P#40) L2 L#41 (512KB) + L1 L#41 (64KB) + Core L#41 + PU L#41 (P#41)
[OMPI users] MPI_MAX_PORT_NAME different in C and Fortran headers
I'm trying to establish communications between two mpi processes using MPI_Open_port / MPI_Publish_name / MPI_Comm_accept in a server and MPI_Lookup_name / MPI_Comm_connect in a client. The source code is in fortran, and the client fails with some sort of "malloc error". It seems that the different values for the MPI_MAX_PORT_NAME constants between C (1024) and Fortran (255) is the reason for the problem. Declaring the Port_name variable in Fortran with size 1023 solves this problem, buy I'm not sure if this is the proper way to handle this issue, and I'm not aware of the possible side-effects of changing MPI_MAX_PORT_NAME in .../include/mpi/mpif-common.h I'm using openmpi 1.4.2 (included in debian stable: 6.0.3) with gfortran 4.4.5 (also the version included in debian stable). Also tried with openmpi 1.4.4 and ifort 11.1 -- Enzo A. Dari Instituto Balseiro / Centro Atomico Bariloche
Re: [OMPI users] Program hangs in mpi_bcast
Yes, this is well documented - may be on the FAQ, but certainly has been in the user list multiple times. The problem is that one process falls behind, which causes it to begin accumulating "unexpected messages" in its queue. This causes the matching logic to run a little slower, thus making the process fall further and further behind. Eventually, things hang because everyone is sitting in bcast waiting for the slow proc to catch up, but it's queue is saturated and it can't. The solution is to do exactly what you describe - add some barriers to force the slow process to catch up. This happened enough that we even added support for it in OMPI itself so you don't have to modify your code. Look at the following from "ompi_info --param coll sync" MCA coll: parameter "coll_base_verbose" (current value: <0>, data source: default value) Verbosity level for the coll framework (0 = no verbosity) MCA coll: parameter "coll_sync_priority" (current value: <50>, data source: default value) Priority of the sync coll component; only relevant if barrier_before or barrier_after is > 0 MCA coll: parameter "coll_sync_barrier_before" (current value: <1000>, data source: default value) Do a synchronization before each Nth collective MCA coll: parameter "coll_sync_barrier_after" (current value: <0>, data source: default value) Do a synchronization after each Nth collective Take your pick - inserting a barrier before or after doesn't seem to make a lot of difference, but most people use "before". Try different values until you get something that works for you. On Nov 14, 2011, at 3:10 PM, Tom Rosmond wrote: > Hello: > > A colleague and I have been running a large F90 application that does an > enormous number of mpi_bcast calls during execution. I deny any > responsibility for the design of the code and why it needs these calls, > but it is what we have inherited and have to work with. > > Recently we ported the code to an 8 node, 6 processor/node NUMA system > (lstopo output attached) running Debian linux 6.0.3 with Open_MPI 1.5.3, > and began having trouble with mysterious 'hangs' in the program inside > the mpi_bcast calls. The hangs were always in the same calls, but not > necessarily at the same time during integration. We originally didn't > have NUMA support, so reinstalled with libnuma support added, but the > problem persisted. Finally, just as a wild guess, we inserted > 'mpi_barrier' calls just before the 'mpi_bcast' calls, and the program > now runs without problems. > > I believe conventional wisdom is that properly formulated MPI programs > should run correctly without barriers, so do you have any thoughts on > why we found it necessary to add them? The code has run correctly on > other architectures, i.g. Crayxe6, so I don't think there is a bug > anywhere. My only explanation is that some internal resource gets > exhausted because of the large number of 'mpi_bcast' calls in rapid > succession, and the barrier calls force synchronization which allows the > resource to be restored. Does this make sense? I'd appreciate any > comments and advice you can provide. > > > I have attached compressed copies of config.log and ompi_info for the > system. The program is built with ifort 12.0 and typically runs with > > mpirun -np 36 -bycore -bind-to-core program.exe > > We have run both interactively and with PBS, but that doesn't seem to > make any difference in program behavior. > > T. Rosmond > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users