Dear MPI Users and Maintainers, I am using openMPI in version 1.10.4 with enabled multithread support and java bindings. I use MPI in java, having one process per machine and multiple threads per process.
I was trying to build a broadcast listener thread which calls MPI_iBcast, followed by MPI_WAIT. I use the request object, which is returned by MPI_iBcast, to shut the listener down, calling MPI-CANCEL for that request from the main thread. This results in [fe-402-1:2972] *** An error occurred in MPI_Cancel [fe-402-1:2972] *** reported by process [1275002881,17179869185 <(717)%20986-9185>] [fe-402-1:2972] *** on communicator MPI_COMM_WORLD [fe-402-1:2972] *** MPI_ERR_REQUEST: invalid request [fe-402-1:2972] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [fe-402-1:2972] *** and potentially your MPI job) Which indicates that the request is invalid in some fashion. I already checked that it is not null (MPI_REQUEST_NULL). I have also set up a simple testbed, where nothing else happens, except that one broadcast. The request object is always invalid, no matter from where i call cancel(). As far as I understand the MPI specifications, cancel is also supposed to work for collective nonblocking communication (which includes my broadcasts). I haven't found any advice yet, so I hope to find some help in this mailing list. Kind regards, Markus Jeromin PS: Testbed for calling mpi cancel, written in Java. _______ package distributed.mpi; import java.nio.ByteBuffer; import mpi.MPI; import mpi.MPIException; import mpi.Request; /** * Testing MPI_CANCEL on MPI_iBcast.<br> * Program does not terminate because the listeners are still running and * waiting for the java native call MPI_WAIT to return. MPI_CANCEL is called, but * the listener never unblocks (i.e. the MPI_WAIT never returns) * * @author mjeromin * */ public class BroadcastTestCancel { static int myrank; /** * Listener that waits for incoming broadcasts from specified root. Uses * asynchronous MPI_iBcast and MPI_WAIT * */ static class Listener extends Thread { ByteBuffer b = ByteBuffer.allocateDirect(100); public Request req = null; @Override public void run() { super.run(); try { req = MPI.COMM_WORLD.iBcast(b, b.limit(), MPI.BYTE, 0); System.out.println(myrank + ": waiting for bcast (that will never come)"); req.waitFor(); } catch (MPIException e) { e.printStackTrace(); } System.out.println(myrank + ": listener unblocked"); } } public static void main(String[] args) throws MPIException, InterruptedException { // we need full thread support int threadSupport = MPI.InitThread(args, MPI.THREAD_MULTIPLE); if (threadSupport != MPI.THREAD_MULTIPLE) { System.out.println(myrank + ": no multithread support. Aborting."); MPI.Finalize(); return; } // disable or enable exceptions, it does not matter at all. MPI.COMM_WORLD.setErrhandler(MPI.ERRORS_RETURN); myrank = MPI.COMM_WORLD.getRank(); // start receiving listeners, but no sender (which would be node 0) if (myrank > 0) { Listener l = new Listener(); l.start(); // let the listener reach at waitFor() Thread.sleep(5000); // call MPI_CANCEL (matching send will never arrive) try { l.req.cancel(); } catch (MPIException e) { // depends on error handler System.out.println(myrank + ": MPI Exception \n" + e.toString()); } } // don't call MPI_FINISH too early. (not that necessary to wait here, but just to be sure) Thread.sleep(15000); System.out.println(myrank + ": calling finish"); MPI.Finalize(); System.out.println(myrank + ": finished"); } }
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users