Re: [OMPI users] mpirun gives error when option '--hostfiles' or '--hosts' is used
Actually all machines use iptables as firewall. I compared the rules triops and kraken use and found that triops had the line REJECT all -- anywhere anywhere reject-with icmp-host-prohibited which kraken did not have (otherwise they were identical). I removed that line from triops' rules, restarted iptables and now communication works in all directions! Thank You Jody On Tue, May 3, 2016 at 7:00 PM, Jeff Squyres (jsquyres) wrote: > Have you disabled firewalls between these machines? > > > On May 3, 2016, at 11:26 AM, jody wrote: > > > > ...my bad! > > > > I had set up things so that PATH and LD_LIBRARY_PATH were correct in > interactive mode, > > but they were wrong ssh was called non-interactively. > > > > Now i have a new problem: > > When i do > > mpirun -np 6 --hostfile krakenhosts hostname > > from triops, sometimes it seems to hang (i.e. no output, doesn't end) > > and at other time i get the ouput > > > > [aim-kraken:24527] [[45056,0],1] tcp_peer_send_blocking: send() to > socket 9 failed: Broken pipe (32) > > > -- > > ORTE was unable to reliably start one or more daemons. > > This usually is caused by: > > ... > > > -- > > - > > Again, i can call mpirun on triops from kraken und all squid_XX without > a problem... > > > > What could cause this problem? > > > > Thank You > > Jody > > > > > > On Tue, May 3, 2016 at 2:54 PM, Jeff Squyres (jsquyres) < > jsquy...@cisco.com> wrote: > > Have you verified that you are running the same version of Open MPI on > both servers when launched from non-interactive logins? > > > > This kind of error is somewhat typical if you accidentally mixed, for > example, Open MPI v1.6.x and v1.10.2 (i.e., v1.10.2 understands the > --hnp-topo-sig back end option, but v1.6.x does not). > > > > > > > On May 3, 2016, at 6:35 AM, jody wrote: > > > > > > Hi > > > I have installed Open MPI v 1.10.2 on two machines today using only > the prefix-option for configure, and then doing 'make all install'. > > > > > > On both machines i changed .bashrc to set PATH and LD_LIBRARY_PATH > correctly. > > > (I checked by running 'mpirun --version' and verifying that the output > does indeed say 1.10.2) > > > > > > Password-less ssh is enabled on both machines in both directions. > > > > > > When i start mpirun form one machine (kraken) with a hostfile > specifying the other machine ("triops slots=8 max-slots=8), > > > it works: > > > - > > > jody@kraken ~ $ mpirun -np 3 --hostfile triopshosts uptime > > > 12:24:04 up 7 days, 43 min, 17 users, load average: 0.06, 0.68, 0.65 > > > 12:24:04 up 7 days, 43 min, 17 users, load average: 0.06, 0.68, 0.65 > > > 12:24:04 up 7 days, 43 min, 17 users, load average: 0.06, 0.68, 0.65 > > > - > > > > > > But when i start mpirun form triops with a hostfile specifying kraken > ("kraken slots=8 max-slots=8"), > > > it fails: > > > - > > > jody@triops ~ $ mpirun -np 3 --hostfile krakenhosts hostname > > > [aim-kraken:21973] Error: unknown option "--hnp-topo-sig" > > > input in flex scanner failed > > > > -- > > > ORTE was unable to reliably start one or more daemons. > > > This usually is caused by: > > > > > > * not finding the required libraries and/or binaries on > > > one or more nodes. Please check your PATH and LD_LIBRARY_PATH > > > settings, or configure OMPI with --enable-orterun-prefix-by-default > > > > > > * lack of authority to execute on one or more specified nodes. > > > Please verify your allocation and authorities. > > > > > > * the inability to write startup files into /tmp > (--tmpdir/orte_tmpdir_base). > > > Please check with your sys admin to determine the correct location > to use. > > > > > > * compilation of the orted with dynamic libraries when static are > required > > > (e.g., on Cray). Please check your configure cmd line and consider > using > > > one of the contrib/platform definitions for your system type. > > > > > > * an inability to create a connection back to mpirun due to a > > > lack of common network interfaces and/or no route found between > > > them. Please check network connectivity (including firewalls > > > and network routing requirements). > > > > -- > > > > > > The same error happens when i use '--host kraken'. > > > > > > I verified that PATH and LD_LIBRARY_PATH are correctly set on both > machines. > > > And on both machines /tmp is readable, writeable and executable for > all. > > > The connection should be okay (i can do a ssh from kraken to triops > and vice versa). > > > > > > Any idea what the problem is? > > > > > > Thank You > > > Jody > > > > > > ___ > > > users mailing list > > > us...@open-
[OMPI users] barrier algorithm 5
With OMPI 1.10.2 and earlier on Infiniband, IMB generally spins with no output for the barrier benchmark if you run it with algorithm 5, i.e. mpirun --mca coll_tuned_use_dynamic_rules 1 --mca coll_tuned_barrier_algorithm 5 IMB-MPI1 barrier This is "two proc only". Does that mean it will only work for two processes (which seems true experimentally)? If so, should it report an error if used with more?
Re: [OMPI users] barrier algorithm 5
Dave, yes, this is for two MPI tasks only. the MPI subroutine could/should return with an error if the communicator is made of more than 3 tasks. an other option would be to abort at initialization time if no collective modules provide a barrier implementation. or maybe the tuned module should have not used the two_procs algorithm, but what should it do instead ? use a default one ? do not implement barrier ? warn/error the end user ? note the error message might be a bit obscure. I write "could" because you explicitly forced something that cannot work, and I am not convinced OpenMPI should protect end users from themselves, even when they make an honest mistake. George, any thoughts ? Cheers, Gilles On Wednesday, May 4, 2016, Dave Love wrote: > With OMPI 1.10.2 and earlier on Infiniband, IMB generally spins with no > output for the barrier benchmark if you run it with algorithm 5, i.e. > > mpirun --mca coll_tuned_use_dynamic_rules 1 --mca > coll_tuned_barrier_algorithm 5 IMB-MPI1 barrier > > This is "two proc only". Does that mean it will only work for two > processes (which seems true experimentally)? If so, should it report an > error if used with more? > ___ > users mailing list > us...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/05/29081.php >
Re: [OMPI users] barrier algorithm 5
Gilles Gouaillardet writes: > Dave, > > yes, this is for two MPI tasks only. > > the MPI subroutine could/should return with an error if the communicator is > made of more than 3 tasks. > an other option would be to abort at initialization time if no collective > modules provide a barrier implementation. > or maybe the tuned module should have not used the two_procs algorithm, but > what should it do instead ? use a default one ? do not implement barrier ? > warn/error the end user ? > > note the error message might be a bit obscure. > > I write "could" because you explicitly forced something that cannot work, > and I am not convinced OpenMPI should protect end users from themselves, > even when they make an honest mistake. I just looped over the available algorithms, not expecting any not to work. One question is how I'd know it can't work; I can't find documentation on the algorithms, just the more-or-less suggestive names that I might be able to find in the literature, or not. Is there a good place to look? In the absence of a good reason why not -- I haven't looked at the code -- but I'd expect it to abort with a message about the algorithm being limited to two processes at some stage. Of course, this isn't a common case, and people probably have more important things to do.
[OMPI users] Multiple Non-blocking Send/Recv calls with MPI_Waitall fails when CUDA IPC is in use
Hi there, I am using multiple MPI non-blocking send receives on the GPU buffer followed by a waitall at the end; I also repeat this process multiple times. The MPI version that I am using 1.10.2. When multiple processes are assigned to a single GPU (or when CUDA IPC is used), I get the following error at the beginning The call to cuIpcGetEventHandle failed. This is a unrecoverable error and will cause the program to abort. cuIpcGetEventHandle return value: 1 and this at the end of my benchmark The call to cuEventDestory failed. This is a unrecoverable error and will cause the program to abort. cuEventDestory return value: 400 Check the cuda.h file for what the return value means. *Note1: * This error doesn't appear if only one iteration of the non-blocking send/receive call is used (i.e., using MPI_Waitall only once ) This error doesn't appear if multiple iterations are used by MPI_Waitall is not included. *Note 2:* This error doesn't exist if the buffer is is allocated on the host. *Note 3:* This error doesn't exist if cuda_ipc is disabled or OMPI version 1.8.8 is used. I'd appreciate if you let me know what causes this issue and how it can be resolved. Regards, Iman
[OMPI users] Isend, Recv and Test
Hi, I'm having a problem with Isend, Recv and Test in Linux Mint 16 Petra. The source is attached. Open MPI 1.10.2 is configured with ./configure --enable-debug --prefix=/home//Tool/openmpi-1.10.2-debug The source is built with ~/Tool/openmpi-1.10.2-debug/bin/mpiCC a5.cpp and run in one node with ~/Tool/openmpi-1.10.2-debug/bin/mpirun -n 2 ./a.out The output is in the end. What puzzles me is why MPI_Test is called so many times, and it takes so long to send a message. Am I doing something wrong? I'm simulating a more complicated program: MPI 0 Isends data to MPI 1, computes (usleep here), and calls Test to check if data are sent. MPI 1 Recvs data, and computes. Thanks in advance. Best regards, Zhen MPI 0: Isend of 0 started at 20:32:35. MPI 1: Recv of 0 started at 20:32:35. MPI 0: MPI_Test of 0 at 20:32:35. MPI 0: MPI_Test of 0 at 20:32:35. MPI 0: MPI_Test of 0 at 20:32:35. MPI 0: MPI_Test of 0 at 20:32:35. MPI 0: MPI_Test of 0 at 20:32:35. MPI 0: MPI_Test of 0 at 20:32:35. MPI 0: MPI_Test of 0 at 20:32:36. MPI 0: MPI_Test of 0 at 20:32:36. MPI 0: MPI_Test of 0 at 20:32:36. MPI 0: MPI_Test of 0 at 20:32:36. MPI 0: MPI_Test of 0 at 20:32:36. MPI 0: MPI_Test of 0 at 20:32:36. MPI 0: MPI_Test of 0 at 20:32:36. MPI 0: MPI_Test of 0 at 20:32:36. MPI 0: MPI_Test of 0 at 20:32:36. MPI 0: MPI_Test of 0 at 20:32:37. MPI 0: MPI_Test of 0 at 20:32:37. MPI 0: MPI_Test of 0 at 20:32:37. MPI 0: MPI_Test of 0 at 20:32:37. MPI 0: MPI_Test of 0 at 20:32:37. MPI 0: MPI_Test of 0 at 20:32:37. MPI 0: MPI_Test of 0 at 20:32:37. MPI 0: MPI_Test of 0 at 20:32:37. MPI 0: MPI_Test of 0 at 20:32:37. MPI 0: MPI_Test of 0 at 20:32:37. MPI 0: MPI_Test of 0 at 20:32:38. MPI 0: MPI_Test of 0 at 20:32:38. MPI 0: MPI_Test of 0 at 20:32:38. MPI 0: MPI_Test of 0 at 20:32:38. MPI 0: MPI_Test of 0 at 20:32:38. MPI 0: MPI_Test of 0 at 20:32:38. MPI 0: MPI_Test of 0 at 20:32:38. MPI 0: MPI_Test of 0 at 20:32:38. MPI 0: MPI_Test of 0 at 20:32:38. MPI 0: MPI_Test of 0 at 20:32:38. MPI 0: MPI_Test of 0 at 20:32:39. MPI 0: MPI_Test of 0 at 20:32:39. MPI 0: MPI_Test of 0 at 20:32:39. MPI 0: MPI_Test of 0 at 20:32:39. MPI 0: MPI_Test of 0 at 20:32:39. MPI 0: MPI_Test of 0 at 20:32:39. MPI 1: Recv of 0 finished at 20:32:39. MPI 0: MPI_Test of 0 at 20:32:39. MPI 0: Isend of 0 finished at 20:32:39. #include "mpi.h" #include #include #include #include int main(int argc, char* argv[]) { MPI_Init(&argc, &argv); int rank; MPI_Comm_rank(MPI_COMM_WORLD, &rank); int n = 99; const int m = 1; std::vector > vec(m); for (int i = 0; i < m; i++) { vec[i].resize(n); } MPI_Request mpiRequest[m]; MPI_Status mpiStatus[m]; char tt[99] = {0}; MPI_Barrier(MPI_COMM_WORLD); if (rank == 0) { for (int i = 0; i < m; i++) { MPI_Isend(&vec[i][0], n, MPI_INT, 1, i, MPI_COMM_WORLD, &mpiRequest[i]); time_t t = time(0); strftime(tt, 9, "%H:%M:%S", localtime(&t)); printf("MPI %d: Isend of %d started at %s.\n", rank, i, tt); } for (int i = 0; i < m; i++) { int done = 0; while (done == 0) { usleep(10); time_t t = time(0); strftime(tt, 9, "%H:%M:%S", localtime(&t)); printf("MPI %d: MPI_Test of %d at %s.\n", rank, i, tt); MPI_Test(&mpiRequest[i], &done, &mpiStatus[i]); //printf("MPI %d: MPI_Wait of %d at %s.\n", rank, i, tt); //MPI_Wait(&mpiRequest[i], &mpiStatus[i]); } time_t t = time(0); strftime(tt, 9, "%H:%M:%S", localtime(&t)); printf("MPI %d: Isend of %d finished at %s.\n", rank, i, tt); } } else { for (int i = 0; i < m; i++) { time_t t = time(0); strftime(tt, 9, "%H:%M:%S", localtime(&t)); printf("MPI %d: Recv of %d started at %s.\n", rank, i, tt); MPI_Recv(&vec[i][0], n, MPI_INT, 0, i, MPI_COMM_WORLD, &mpiStatus[i]); t = time(0); strftime(tt, 9, "%H:%M:%S", localtime(&t)); printf("MPI %d: Recv of %d finished at %s.\n", rank, i, tt); } } MPI_Finalize(); return 0; }
Re: [OMPI users] Isend, Recv and Test
Note there is no progress thread in openmpi 1.10 from a pragmatic point of view, that means that for "large" messages, no data is sent in MPI_Isend, and the data is sent when MPI "progresses" e.g. call a MPI_Test, MPI_Probe, MPI_Recv or some similar subroutine. in your example, the data is transferred after the first usleep completes. that being said, it takes quite a while, and there could be an issue. what if you use MPI_Send instead () ? what if you send/Recv a large message first (to "warmup" connections), MPI_Barrier, and then start your MPI_Isend ? Cheers, Gilles On Thursday, May 5, 2016, Zhen Wang wrote: > Hi, > > I'm having a problem with Isend, Recv and Test in Linux Mint 16 Petra. The > source is attached. > > Open MPI 1.10.2 is configured with > ./configure --enable-debug --prefix=/home//Tool/openmpi-1.10.2-debug > > The source is built with > ~/Tool/openmpi-1.10.2-debug/bin/mpiCC a5.cpp > > and run in one node with > ~/Tool/openmpi-1.10.2-debug/bin/mpirun -n 2 ./a.out > > The output is in the end. What puzzles me is why MPI_Test is called so > many times, and it takes so long to send a message. Am I doing something > wrong? I'm simulating a more complicated program: MPI 0 Isends data to MPI > 1, computes (usleep here), and calls Test to check if data are sent. MPI 1 > Recvs data, and computes. > > Thanks in advance. > > > Best regards, > Zhen > > MPI 0: Isend of 0 started at 20:32:35. > MPI 1: Recv of 0 started at 20:32:35. > MPI 0: MPI_Test of 0 at 20:32:35. > MPI 0: MPI_Test of 0 at 20:32:35. > MPI 0: MPI_Test of 0 at 20:32:35. > MPI 0: MPI_Test of 0 at 20:32:35. > MPI 0: MPI_Test of 0 at 20:32:35. > MPI 0: MPI_Test of 0 at 20:32:35. > MPI 0: MPI_Test of 0 at 20:32:36. > MPI 0: MPI_Test of 0 at 20:32:36. > MPI 0: MPI_Test of 0 at 20:32:36. > MPI 0: MPI_Test of 0 at 20:32:36. > MPI 0: MPI_Test of 0 at 20:32:36. > MPI 0: MPI_Test of 0 at 20:32:36. > MPI 0: MPI_Test of 0 at 20:32:36. > MPI 0: MPI_Test of 0 at 20:32:36. > MPI 0: MPI_Test of 0 at 20:32:36. > MPI 0: MPI_Test of 0 at 20:32:37. > MPI 0: MPI_Test of 0 at 20:32:37. > MPI 0: MPI_Test of 0 at 20:32:37. > MPI 0: MPI_Test of 0 at 20:32:37. > MPI 0: MPI_Test of 0 at 20:32:37. > MPI 0: MPI_Test of 0 at 20:32:37. > MPI 0: MPI_Test of 0 at 20:32:37. > MPI 0: MPI_Test of 0 at 20:32:37. > MPI 0: MPI_Test of 0 at 20:32:37. > MPI 0: MPI_Test of 0 at 20:32:37. > MPI 0: MPI_Test of 0 at 20:32:38. > MPI 0: MPI_Test of 0 at 20:32:38. > MPI 0: MPI_Test of 0 at 20:32:38. > MPI 0: MPI_Test of 0 at 20:32:38. > MPI 0: MPI_Test of 0 at 20:32:38. > MPI 0: MPI_Test of 0 at 20:32:38. > MPI 0: MPI_Test of 0 at 20:32:38. > MPI 0: MPI_Test of 0 at 20:32:38. > MPI 0: MPI_Test of 0 at 20:32:38. > MPI 0: MPI_Test of 0 at 20:32:38. > MPI 0: MPI_Test of 0 at 20:32:39. > MPI 0: MPI_Test of 0 at 20:32:39. > MPI 0: MPI_Test of 0 at 20:32:39. > MPI 0: MPI_Test of 0 at 20:32:39. > MPI 0: MPI_Test of 0 at 20:32:39. > MPI 0: MPI_Test of 0 at 20:32:39. > MPI 1: Recv of 0 finished at 20:32:39. > MPI 0: MPI_Test of 0 at 20:32:39. > MPI 0: Isend of 0 finished at 20:32:39. > >