Dear MPI Users and Maintainers,

I am using openMPI in version 1.10.4 with enabled multithread support and
java bindings. I use MPI in java, having one process per machine and
multiple threads per process.

I was trying to build a broadcast listener thread which calls MPI_iBcast,
followed by MPI_WAIT.

I use the request object, which is returned by MPI_iBcast, to shut the
listener down, calling MPI-CANCEL for that request from the main thread.
This results in

[fe-402-1:2972] *** An error occurred in MPI_Cancel
[fe-402-1:2972] *** reported by process [1275002881,17179869185
<(717)%20986-9185>]
[fe-402-1:2972] *** on communicator MPI_COMM_WORLD
[fe-402-1:2972] *** MPI_ERR_REQUEST: invalid request
[fe-402-1:2972] *** MPI_ERRORS_ARE_FATAL (processes in this communicator
will now abort,
[fe-402-1:2972] ***    and potentially your MPI job)


Which indicates that the request is invalid in some fashion. I already
checked that it is not null (MPI_REQUEST_NULL). I have also set up a simple
testbed, where nothing else happens, except that one broadcast. The request
object is always invalid, no matter from where i call cancel().

As far as I understand the MPI specifications, cancel is also supposed to
work for collective nonblocking communication (which includes my
broadcasts). I haven't found any advice yet, so I hope to find some help in
this mailing list.

Kind regards,
Markus Jeromin

PS: Testbed for calling mpi cancel, written in Java.
_______

package distributed.mpi;

import java.nio.ByteBuffer;

import mpi.MPI;
import mpi.MPIException;
import mpi.Request;

/**
 * Testing MPI_CANCEL on MPI_iBcast.<br>
 * Program does not terminate because the listeners are still running and
 * waiting for the java native call MPI_WAIT to return. MPI_CANCEL is
called, but
 * the listener never unblocks (i.e. the MPI_WAIT never returns)
 *
 * @author mjeromin
 *
 */
public class BroadcastTestCancel {

static int myrank;

/**
* Listener that waits for incoming broadcasts from specified root. Uses
* asynchronous MPI_iBcast and MPI_WAIT
*
*/
static class Listener extends Thread {

ByteBuffer b = ByteBuffer.allocateDirect(100);
public Request req = null;

@Override
public void run() {
super.run();
try {
req = MPI.COMM_WORLD.iBcast(b, b.limit(), MPI.BYTE, 0);
System.out.println(myrank + ": waiting for bcast (that will never come)");
req.waitFor();
} catch (MPIException e) {
e.printStackTrace();
}
System.out.println(myrank + ": listener unblocked");
}
}

public static void main(String[] args) throws MPIException,
InterruptedException {

// we need full thread support
int threadSupport = MPI.InitThread(args, MPI.THREAD_MULTIPLE);
if (threadSupport != MPI.THREAD_MULTIPLE) {
System.out.println(myrank + ": no multithread support. Aborting.");
MPI.Finalize();
return;
}

// disable or enable exceptions, it does not matter at all.
MPI.COMM_WORLD.setErrhandler(MPI.ERRORS_RETURN);

myrank = MPI.COMM_WORLD.getRank();

// start receiving listeners, but no sender (which would be node 0)
if (myrank > 0) {
Listener l = new Listener();
l.start();

// let the listener reach at waitFor()
Thread.sleep(5000);

// call MPI_CANCEL (matching send will never arrive)
try {
l.req.cancel();
} catch (MPIException e) {
// depends on error handler
System.out.println(myrank + ": MPI Exception \n" + e.toString());
}
}

// don't call MPI_FINISH too early. (not that necessary to wait here, but
just to be sure)
Thread.sleep(15000);

System.out.println(myrank + ": calling finish");
MPI.Finalize();
System.out.println(myrank + ": finished");
}

}
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to