I'm encountering an error using qsub that none of us can figure out. MPI
C++ programs seem to
run fine when executed from the command line, but for some reason when I
submit them through
the queue I get a strange error message ..


[compute-3-12.local][[58672,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]

connect() to 2002:8170:6c2f:b:21d:9ff:fefd:7d94 failed: Permission denied
(13)


the compute node 3-12 doesn't matter (the error can generate from any of
the nodes, and I'm
guessing that 3-12 is the parent node here).

To check if there was some problem with my own code, I created a simple
'hello world' program
(see attached files).

Again, the program runs fine from the command line but fails in qsub with
the same sort of error
message.

I have included (i) the code (ii) the job script for qsub, and (iii) the
".o" file from qsub for the
"hello world" program.

These don't look like MPI errors, but rather some conflict with, maybe,
secure communication
accross nodes.

Is there something simple I can do to fix this?

Thanks,

Erik Nelson

Howard Hughes Medical Institute
6001 Forest Park Blvd., Room ND10.124
Dallas, Texas 75235-9050

p : 214 645 5981
f : 214 645 5948
#include <stdio.h>
#include "/opt/openmpi/include/mpi.h"

#define bufdim        128

int main(int argc, char *argv[])
{
    char buffer[bufdim];
    char id_str[32];

//  mpi :
    MPI::Init(argc,argv);
    MPI::Status status;
    
    int size;
    int rank;
    int tag;

    size=MPI::COMM_WORLD.Get_size();
    rank=MPI::COMM_WORLD.Get_rank();
    tag=0;

    if (rank==0) {
	printf("%d: we have %d processors\n",rank,size);
	int i;
	i=1;
	for ( ;i<size; ++i) {
	    sprintf(buffer,"hello  %d! ",i);
	    MPI::COMM_WORLD.Send(buffer,bufdim,MPI::CHAR,i,tag);
	}
	i=1;
	for ( ;i<size; ++i) {
	    MPI::COMM_WORLD.Recv(buffer,bufdim,MPI::CHAR,i,tag,status);
	    printf("%d: %s\n",rank,buffer);
	}
    }
    else {
	MPI::COMM_WORLD.Recv(buffer,bufdim,MPI::CHAR,0,tag,status);

	sprintf(id_str,"processor %d ",rank);
	strncat(buffer,id_str,bufdim-1);
	strncat(buffer,"reporting for duty\n",bufdim-1);

	MPI::COMM_WORLD.Send(buffer,bufdim,MPI::CHAR,0,tag);
    }
    MPI::Finalize();
    return 0;
}


Attachment: hello.job
Description: Binary data

Attachment: hello.job.o5822590
Description: Binary data

Reply via email to