There's been a lot of turnover in this exact portion of the code base on the SVN trunk in the last week or three.

Ralph -- can you comment on where we are?


On Apr 26, 2008, at 2:07 PM, Alberto Giannetti wrote:

Doesn't seem to work. This is the appfile I'm using:

# Application context files specify each sub-application in the
# parallel job, one per line.
# Server
-np 2 server
# Client
-np 1 client 0.1.0:2001

And the output:

mpirun --app ./appfile
Processor 0 (3659, Receiver) initialized
Processor 1 (3661, Receiver) initialized
Processor 0 opened port 0.1.0:2001
Processor 0 waiting for connections on 0.1.0:2001...
Processor 1 opened port 0.1.1:2000
Processor 1 waiting for connections on 0.1.1:2000...
Processor 2 (3663, Sender) initialized
Processor 2 connecting to '0.1.0:2001'


The client hangs during the connect.


On Apr 26, 2008, at 11:29 AM, Aurélien Bouteiller wrote:
This scenario is known to be buggy in some versions of Open MPI. It is
now fixed in svn version and will be part of the 1.3 release.

To quick fix your application, you'll need to spawn both applications
with the same mpirun, with MPMD syntax. However this will have the
adverse effect of having a larger than expected MPI_COMM_WORLD.

Aurelien


Le 26 avr. 08 à 00:31, Alberto Giannetti a écrit :

I want to connect two MPI programs through the MPI_Comm_connect/
MPI_Comm_Accept API.
This is my server app:

int main(int argc, char* argv[])
{
 int rank, count;
 int i;
 float data[100];
 char myport[MPI_MAX_PORT_NAME];
 MPI_Status status;
 MPI_Comm intercomm;

 MPI_Init(&argc, &argv);
 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
 printf("Processor %d (%d, Receiver) initialized\n", rank, getpid
());

 MPI_Open_port(MPI_INFO_NULL, myport);
 printf("Opened port %s\n", myport);

 printf("Waiting for connections on %s...\n", myport);
 MPI_Comm_accept(myport, MPI_INFO_NULL, 0, MPI_COMM_SELF,
&intercomm);
 printf("New connection on port %s\n", myport);

 printf("Processor %d waiting for data from new intercomm...\n",
rank);
 MPI_Recv(data, 100, MPI_FLOAT, MPI_ANY_SOURCE, MPI_ANY_TAG,
intercomm, &status);
 MPI_Get_count(&status, MPI_FLOAT, &count);
 printf("Processor %d got %d elements: %f, %f, %f...\n", rank,
count, data[0], data[1], data[2]);

 MPI_Finalize();
}


And my client program:

int main(int argc, char* argv[])
{
 int rank, i;
 float data[100];
 char myport[MPI_MAX_PORT_NAME];
 MPI_Comm intercomm;

 MPI_Init(&argc, &argv);
 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
 printf("Processor %d (%d, Sender) initialized\n", rank, getpid());

 if( argc < 2 ) {
   fprintf(stderr, "Require server port name\n");
   MPI_Finalize();
   exit(-1);
 }

 for( i = 0; i < 100; i++ )
   data[i] = i;

 strcpy(myport, argv[1]);
 printf("Processor %d connecting to '%s'\n", rank, myport);
 MPI_Comm_connect(myport, MPI_INFO_NULL, 0, MPI_COMM_SELF,
&intercomm);

 printf("Processor %d sending data through intercomm...\n", rank);
 MPI_Send(data, 100, MPI_FLOAT, 0, 55, intercomm);
 printf("Processor %d data sent!\n", rank);

 MPI_Finalize();
}


I run the server program:

mpirun -np 2 app2
Processor 0 (7916, Receiver) initialized
Processor 1 (7918, Receiver) initialized
Opened port 0.1.0:2000
Waiting for connections on 0.1.0:2000...
Opened port 0.1.1:2001
Waiting for connections on 0.1.1:2001...


Then the client:

mpirun -np 1 app1 0.1.0:2000
Processor 0 (7933, Sender) initialized
Processor 0 connecting to '0.1.0:2000'
[alberto-giannettis-computer.local:07933] [0,1,0] ORTE_ERROR_LOG: Not
found in file /tmp/buildpackage-3432/openmpi-1.2.4/orte/dss/
dss_unpack.c at line 209
[alberto-giannettis-computer.local:07933] [0,1,0] ORTE_ERROR_LOG: Not found in file /tmp/buildpackage-3432/openmpi-1.2.4/ompi/ communicator/
comm_dyn.c at line 186
[alberto-giannettis-computer.local:07933] *** An error occurred in
MPI_Comm_connect
[alberto-giannettis-computer.local:07933] *** on communicator
MPI_COMM_SELF
[alberto-giannettis-computer.local:07933] *** MPI_ERR_INTERN:
internal error
[alberto-giannettis-computer.local:07933] *** MPI_ERRORS_ARE_FATAL
(goodbye)


Why do I have an internal error? If I try to connect to 0.1.1:2001
from the client the program hangs.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
* Dr. Aurélien Bouteiller
* Sr. Research Associate at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 350
* Knoxville, TN 37996
* 865 974 6321





_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems


Reply via email to