Lenny Verkhovsky wrote:
have you tried IMB benchmark with Bcast,
I think the problem is in the app.
Presumably not since increasing btl_sm_num_fifos cures the problem.
This appears to be trac 2043 (again)! Note that all processes *do*
enter the broadcasts. The first broadcast call is exactly the same on
each and every process. The second broadcast only differs in that that
root uses one buffer and the non-root processes use a different buffer.
All ranks in the communicator should enter Bcast,
since you have
if (rank==0)
else state, not all of them enters the same flow.
if (iRank == 0)
{
iLength = sizeof (acMessage);
MPI_Bcast (&iLength, 1, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Bcast (acMessage, iLength, MPI_CHAR, 0, MPI_COMM_WORLD);
printf ("Process 0: Message sent\n");
}
else
{
MPI_Bcast (&iLength, 1, MPI_INT, 0, MPI_COMM_WORLD);
pMessage = (char *) malloc (iLength);
MPI_Bcast (pMessage, iLength, MPI_CHAR, 0, MPI_COMM_WORLD);
printf ("Process %d: %s\n", iRank, pMessage);
}
Lenny.
On Mon, Jan 4, 2010 at 8:23 AM, Eugene Loh <eugene....@sun.com>
wrote:
If you're willing to try some stuff:
1) What about "-mca coll_sync_barrier_before 100"? (The default may be
1000. So, you can try various values less than 1000. I'm suggesting
100.) Note that broadcast has somewhat one-way traffic flow, which can
have some undesirable flow control issues.
2) What about "-mca btl_sm_num_fifos 16"? Default is 1. If the
problem is trac ticket 2043, then this suggestion can help.
P.S. There's a memory leak, right? The receive buffer is being
allocated over and over again. Might not be that closely related to
the problem you see here, but at a minimum it's bad style.
Louis Rossi wrote:
I am
having a problem with BCast hanging on a dual quad core Opteron (2382,
2.6GHz, Quad Core, 4 x 512KB L2, 6MB L3 Cache) system running FC11 with
openmpi-1.4. The LD_LIBRARY_PATH and PATH variables are correctly
set. I have used the FC11 rpm distribution of openmpi and built
openmpi-1.4 locally with the same results. The problem was first
observed in a larger reliable CFD code, but I can create the problem
with a simple demo code (attached). The code attempts to execute 2000
pairs of broadcasts.
The hostfile contains a single line
<machinename> slots=8
If I run it with 4 cores or fewer, the code will run fine.
If I run it with 5 cores or more, it will hang some of the time after
successfully executing several hundred broadcasts. The number varies
from run to run. The code usually finishes with 5 cores. The
probability of hanging seems to increase with the number of nodes. The
syntax I use is simple.
mpiexec -machinefile hostfile -np 5 bcast_example
There was some discussion of a similar problem on the user list, but I
could not find a resolution. I have tried setting the processor
affinity (--mca mpi_paffinity_alone 1). I have tried varying the
broadcast algorithm (--mca coll_tuned_bcast_algorithm 1-6). I have
also tried excluding (-mca oob_tcp_if_exclude) my eth1 interface (see
ifconfig.txt attached) which is not connected to anything. None of
these changed the outcome.
Any thoughts or suggestions would be appreciated.
|