After I replaced ";" with "\;" in the server name I got passed the ABORT problem. Now the client and server deadlock until I finally get (on the client side):
mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -------------------------------------------------------------------------- [jski:02429] [[59675,0],0] -> [[59187,0],0] (node: jski) oob-tcp: Number of attempts to create TCP connection has been exceeded. Cannot communicate with peer. On Sat, Apr 13, 2013 at 7:24 PM, John Chludzinski <john.chludzin...@gmail.com> wrote: > Sorry: The previous post was intended for another group, ignore it. > > With regards to the client-server problem: > > $ mpirun -n 1 client > 3878879232.0;tcp://192.168.1.4:37625+3878879233.0;tcp://192.168.1.4:38945:300 > > [jski:01882] [[59199,1],0] ORTE_ERROR_LOG: Not found in file > dpm_orte.c at line 158 > [jski:1882] *** An error occurred in MPI_Comm_connect > [jski:1882] *** on communicator MPI_COMM_WORLD > [jski:1882] *** MPI_ERR_INTERN: internal error > [jski:1882] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort > -------------------------------------------------------------------------- > mpirun has exited due to process rank 0 with PID 1882 on > node jski exiting improperly. There are two reasons this could occur: > > 1. this process did not call "init" before exiting, but others in > the job did. This can cause a job to hang indefinitely while it waits > for all processes to call "init". By rule, if one process calls "init", > then ALL processes must call "init" prior to termination. > > 2. this process called "init", but exited without calling "finalize". > By rule, all processes that call "init" MUST call "finalize" prior to > exiting or it will be considered an "abnormal termination" > > On Sat, Apr 13, 2013 at 7:16 PM, John Chludzinski > <john.chludzin...@gmail.com> wrote: >> After I "source mpi.ksk", PATH is unchanged but LD_LIBRARY_PATH is there: >> >> $ print $LD_LIBRARY_PATH >> /usr/lib64/openmpi/lib/ >> >> Why does PATH loose its change? >> >> ---John >> >> >> On Sat, Apr 13, 2013 at 12:55 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> You need to pass in the port info that the server printed - just copy/paste >>> the line below "server available at". >>> >>> On Apr 12, 2013, at 10:58 PM, John Chludzinski <john.chludzin...@gmail.com> >>> wrote: >>> >>>> Found the following client-server example (code) on >>>> http://www.mpi-forum.org and I'm trying to get it to work. Not sure >>>> what argv[1] should be for the client? The output from the server >>>> side is: >>>> >>>> server available at >>>> 4094230528.0;tcp://192.168.1.4:55803+4094230529.0;tcp://192.168.1.4:51618:300 >>>> >>>> >>>> // SERVER >>>> #include <stdio.h> >>>> #include <error.h> >>>> #include <errno.h> >>>> #include "mpi.h" >>>> >>>> #define MAX_DATA 100 >>>> #define FATAL 1 >>>> >>>> int main( int argc, char **argv ) >>>> { >>>> MPI_Comm client; >>>> MPI_Status status; >>>> char port_name[MPI_MAX_PORT_NAME]; >>>> double buf[MAX_DATA]; >>>> int size, again; >>>> >>>> MPI_Init( &argc, &argv ); >>>> MPI_Comm_size(MPI_COMM_WORLD, &size); >>>> if (size != 1) error(FATAL, errno, "Server too big"); >>>> MPI_Open_port(MPI_INFO_NULL, port_name); >>>> printf("server available at %s\n",port_name); >>>> >>>> while (1) >>>> { >>>> MPI_Comm_accept( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &client >>>> ); >>>> again = 1; >>>> >>>> while (again) >>>> { >>>> MPI_Recv( buf, MAX_DATA, MPI_DOUBLE, MPI_ANY_SOURCE, >>>> MPI_ANY_TAG, client, &status ); >>>> >>>> switch (status.MPI_TAG) >>>> { >>>> case 0: MPI_Comm_free( &client ); >>>> MPI_Close_port(port_name); >>>> MPI_Finalize(); >>>> return 0; >>>> case 1: MPI_Comm_disconnect( &client ); >>>> again = 0; >>>> break; >>>> case 2: /* do something */ >>>> fprintf( stderr, "Do something ...\n" ); >>>> default: >>>> /* Unexpected message type */ >>>> MPI_Abort( MPI_COMM_WORLD, 1 ); >>>> } >>>> } >>>> } >>>> } >>>> >>>> //CLIENT >>>> #include <string.h> >>>> #include "mpi.h" >>>> >>>> #define MAX_DATA 100 >>>> >>>> int main( int argc, char **argv ) >>>> { >>>> MPI_Comm server; >>>> double buf[MAX_DATA]; >>>> char port_name[MPI_MAX_PORT_NAME]; >>>> int done = 0, tag, n, CNT=0; >>>> >>>> MPI_Init( &argc, &argv ); >>>> strcpy(port_name, argv[1] ); /* assume server's name is cmd-line arg */ >>>> >>>> MPI_Comm_connect( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &server ); >>>> >>>> n = MAX_DATA; >>>> >>>> while (!done) >>>> { >>>> tag = 2; /* Action to perform */ >>>> if ( CNT == 5 ) { tag = 0; done = 1; } >>>> MPI_Send( buf, n, MPI_DOUBLE, 0, tag, server ); >>>> CNT++; >>>> /* etc */ >>>> } >>>> >>>> MPI_Send( buf, 0, MPI_DOUBLE, 0, 1, server ); >>>> MPI_Comm_disconnect( &server ); >>>> MPI_Finalize(); >>>> >>>> return 0; >>>> } >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users