[OMPI users] Fault Tolerant Method
I have implemented the fault tolerance method in which you would use MPI_COMM_SPAWN to dynamically create communication groups and use those communicators for a form of process fault tolerance (as described by William Gropp and Ewing Lusk in their 2004 paper), but am having some problems getting it to work the way I intended. Basically, when it runs, it is spawning all the processes on the same machine (as it always starts at the top of the machine_list when spawning a process). Is there a way that I get get these processes to spawn on different machines? One possible route I considerd was using something like SLURM to distribute the jobs, and just putting '+' in the machine file. Will this work? Is this the best route to go? Thanks for any help with this. Byron
Re: [OMPI users] Fault Tolerant Method
> I have implemented the fault tolerance method in which you would use > MPI_COMM_SPAWN to dynamically create communication groups and use > those communicators for a form of process fault tolerance (as > described by William Gropp and Ewing Lusk in their 2004 paper), > but am having some problems getting it to work the way I intended. > Basically, when it runs, it is spawning all the processes on the > same machine (as it always starts at the top of the machine_list > when spawning a process). Is there a way that I get get these > processes to spawn on different machines? > In Open MPI (and most other MPI implementations) you will be restricted to using only the machines in your allocation when you use MPI_Comm_spawn*. The standard allows you can suggest to MPI_Comm_spawn where to place the 'children' that it creates using the MPI_Info key -- specifically the {host} keyvalue referenced here: http://www.mpi-forum.org/docs/mpi-20-html/node97.htm#Node97 MPI_Info is described here: http://www.mpi-forum.org/docs/mpi-20-html/node53.htm#Node53 Open MPI, in the current release, does not do anything with this key. This has been fixed in subversion (as of r11039) and will be in the next release of Open MPI. If you want to use this functionality in the near term I would suggest using the nightly build of the subversion trunk available here: http://www.open-mpi.org/nightly/trunk/ > One possible route I considerd was using something like SLURM to > distribute the jobs, and just putting '+' in the machine file. Will > this work? Is this the best route to go? Off the top of my head, I'm not sure if that would work of not. The best/cleanest route would be to use the MPI_Info command and the {host} key. Let us know if you have any trouble with MPI_Comm_spawn or MPI_Info in this scenario. Hope that helps, Josh > > Thanks for any help with this. > > Byron > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Fault Tolerant Method
don't forget furthermore, that for successfully using this fault-tolerance approach, the parents or other child processes should not be affected by the death/failure of another child process. Right now in Open MPI, if one of the child processes (which you spawned using MPI_Comm_spawn) fails, the whole application will fail. [To be more precise: the MPI standard does not enforce/mandate the behavior described in the paper which you mentioned] Thanks Edgar Josh Hursey wrote: I have implemented the fault tolerance method in which you would use MPI_COMM_SPAWN to dynamically create communication groups and use those communicators for a form of process fault tolerance (as described by William Gropp and Ewing Lusk in their 2004 paper), but am having some problems getting it to work the way I intended. Basically, when it runs, it is spawning all the processes on the same machine (as it always starts at the top of the machine_list when spawning a process). Is there a way that I get get these processes to spawn on different machines? In Open MPI (and most other MPI implementations) you will be restricted to using only the machines in your allocation when you use MPI_Comm_spawn*. The standard allows you can suggest to MPI_Comm_spawn where to place the 'children' that it creates using the MPI_Info key -- specifically the {host} keyvalue referenced here: http://www.mpi-forum.org/docs/mpi-20-html/node97.htm#Node97 MPI_Info is described here: http://www.mpi-forum.org/docs/mpi-20-html/node53.htm#Node53 Open MPI, in the current release, does not do anything with this key. This has been fixed in subversion (as of r11039) and will be in the next release of Open MPI. If you want to use this functionality in the near term I would suggest using the nightly build of the subversion trunk available here: http://www.open-mpi.org/nightly/trunk/ One possible route I considerd was using something like SLURM to distribute the jobs, and just putting '+' in the machine file. Will this work? Is this the best route to go? Off the top of my head, I'm not sure if that would work of not. The best/cleanest route would be to use the MPI_Info command and the {host} key. Let us know if you have any trouble with MPI_Comm_spawn or MPI_Info in this scenario. Hope that helps, Josh Thanks for any help with this. Byron ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Problem with Openmpi 1.1
Trolling through some really old mails that never got replies... :-( I'm afraid that the guy who did the GM code in Open MPI is currently on vacation, but we have made a small number of changes since 1.1 that may have fixed your issue. Could you try one of the 1.1.1 release candidate tarballs and see if you still have the problem? http://www.open-mpi.org/software/ompi/v1.1/ On 7/3/06 12:58 PM, "Borenstein, Bernard S" wrote: > I've built and sucessfully run the Nasa Overflow 2.0aa program with > Openmpi 1.0.2. I'm running on an opteron linux cluster running SLES 9 > and GM 2.0.24. I built Openmpi 1.1 with the intel 9 compilers and try to > run Overflow 2.0aa with myrinet, it get what looks like a data > corruption error and the program dies quickly. > There are no mpi errors at all.If I run using GIGE (--mca btl self,tcp), > the program runs to competion correctly. Here is my ompi_info output : > > bsb3227@mahler:~/openmpi_1.1/bin> ./ompi_info > Open MPI: 1.1 >Open MPI SVN revision: r10477 > Open RTE: 1.1 >Open RTE SVN revision: r10477 > OPAL: 1.1 >OPAL SVN revision: r10477 > Prefix: /home/bsb3227/openmpi_1.1 > Configured architecture: x86_64-unknown-linux-gnu >Configured by: bsb3227 >Configured on: Fri Jun 30 07:08:54 PDT 2006 > Configure host: mahler > Built by: bsb3227 > Built on: Fri Jun 30 07:54:46 PDT 2006 > Built host: mahler > C bindings: yes > C++ bindings: yes > Fortran77 bindings: yes (all) > Fortran90 bindings: yes > Fortran90 bindings size: small > C compiler: icc > C compiler absolute: /opt/intel/cce/9.0.25/bin/icc > C++ compiler: icpc >C++ compiler absolute: /opt/intel/cce/9.0.25/bin/icpc > Fortran77 compiler: ifort > Fortran77 compiler abs: /opt/intel/fce/9.0.25/bin/ifort > Fortran90 compiler: /opt/intel/fce/9.0.25/bin/ifort > Fortran90 compiler abs: /opt/intel/fce/9.0.25/bin/ifort > C profiling: yes >C++ profiling: yes > Fortran77 profiling: yes > Fortran90 profiling: yes > C++ exceptions: no > Thread support: posix (mpi: no, progress: no) > Internal debug support: no > MPI parameter check: runtime > Memory profiling support: no > Memory debugging support: no > libltdl support: yes > MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.1) >MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1) >MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1) >MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.1) >MCA timer: linux (MCA v1.0, API v1.0, Component v1.1) >MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) >MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) > MCA coll: basic (MCA v1.0, API v1.0, Component v1.1) > MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1) > MCA coll: self (MCA v1.0, API v1.0, Component v1.1) > MCA coll: sm (MCA v1.0, API v1.0, Component v1.1) > MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1) > MCA io: romio (MCA v1.0, API v1.0, Component v1.1) >MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1) >MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1) > MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1) > MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1) > MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1) > MCA btl: self (MCA v1.0, API v1.0, Component v1.1) > MCA btl: sm (MCA v1.0, API v1.0, Component v1.1) > MCA btl: gm (MCA v1.0, API v1.0, Component v1.1) > MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0) > MCA topo: unity (MCA v1.0, API v1.0, Component v1.1) > MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0) > MCA gpr: null (MCA v1.0, API v1.0, Component v1.1) > MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1) > MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1) > MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1) > MCA iof: svc (MCA v1.0, API v1.0, Component v1.1) > MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1) > MCA ns: replica (MCA v1.0, API v1.0, Component v1.1) > MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) > MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.1) > MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.1) > MCA ras: localhost (MCA v1.0, API v1.0, Component v1.1) > MCA ras: slurm (MCA v
Re: [OMPI users] OS X, OpenMPI 1.1: An error occurred in MPI_Allreduce on, communicator MPI_COMM_WORLD (Jeff Squyres (jsquyres))
Trolling through some really old messages that never got replies... :-( The behavior that you are seeing is happening as the result of a really long discussion among the OMPI developers when we were writing the TCP device. The problem is that there is ambiguity when connecting peers across TCP in Open MPI. Specifically, since OMPI can span multiple TCP networks, each MPI process may be able to use multiple IP addresses to each to each other MPI process (and vice versa). So we have to try to figure out which IP addresses can speak to which others. For example, say that you have a cluster with 16 nodes on a private ethernet network. One of these nodes doubles as the head node for the cluster and therefore has 2 ethernet NICs -- one to the external network and one to the internal cluster network. But since 16 is a nice number, you also want to use it for computation as well. So when you mpirun spanning all 16 nodes, OMPI has to figure out to *not* use the external NIC on the head node and only use the internal NIC. TCP connections are only made upon demand which is why you only see this behavior if two processes actually attempt to communicate via MPI (i.e., "hello world" with no sending/receiving works fine, but adding the MPI_SEND/MPI_RECV makes it fail). We make connections by having all MPI processes exchange their IP address(es) and port number(s) during MPI_INIT (via a common rendevouz point, typically mpirun). Then, whenever a connection is requested between two processes, we apply a small set of rules to all pair combinations of IP addresses of those processes: 1. If the two IP addresses match after the subnet mask is applied, assume that they are mutually routable and allow the connection 2. If the two IP addresses are public, assume that they are mutually routable and allow the connection 3. Otherwise, the connection is disallowed (this is not an error -- we just disallow this connection on the hope that some other device can be used to make a connection). What is happening in your case is that you're falling through to #3 for all IP address pair combinations and there is no other device that can reach these processes. Therefore OMPI thinks that it has no channel to reach the remote process. So it bails (in a horribly non-descriptive way :-( ). We actually have a very long comment about this in the TCP code and mention that your scenario (lots of hosts in a single cluster with private addresses and relatively narrow subnet masks, even though all addresses are, in fact, routable to each other) is not currently supported -- and that we need to do something "better". "Better" in this case probably means having a configuration file that specifies what hosts are mutually routable when the above rules don't work. Do you have any suggestions on this front? On 7/5/06 1:15 PM, "Frank Kahle" wrote: > users-requ...@open-mpi.org wrote: >> A few clarifying questions: >> >> What is your netmask on these hosts? >> >> Where is the MPI_ALLREDUCE in your app -- right away, or somewhere deep >> within the application? Can you replicate this with a simple MPI >> application that essentially calls MPI_INIT, MPI_ALLREDUCE, and >> MPI_FINALIZE? >> >> Can you replicate this with a simple MPI app that does an MPI_SEND / >> MPI_RECV between two processes on the different subnets? >> >> Thanks. >> >> > > @ Jeff, > > netmask 255.255.255.0 > > Running a simple "hello world" yields no error on each subnet, but > running "hello world" on both subnets yields the error > > [g5dual.3-net:00436] *** An error occurred in MPI_Send > [g5dual.3-net:00436] *** on communicator MPI_COMM_WORLD > [g5dual.3-net:00436] *** MPI_ERR_INTERN: internal error > [g5dual.3-net:00436] *** MPI_ERRORS_ARE_FATAL (goodbye) > > Hope this helps! > > Frank > > > Just in case you wanna check the source: > cFortran example hello_world > program hello > include 'mpif.h' > integer rank, size, ierror, tag, status(MPI_STATUS_SIZE) > character*12 message > > call MPI_INIT(ierror) > call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror) > call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror) > tag = 100 > > if (rank .eq. 0) then > message = 'Hello, world' > do i=1, size-1 > call MPI_SEND(message, 12, MPI_CHARACTER, i, tag, > & MPI_COMM_WORLD, ierror) > enddo > > else > call MPI_RECV(message, 12, MPI_CHARACTER, 0, tag, > &MPI_COMM_WORLD, status, ierror) > endif > > print*, 'node', rank, ':', message > call MPI_FINALIZE(ierror) > end > > > or the full output: > > [powerbook:/Network/CFD/hello] motte% mpirun -d -np 5 --hostfile > ./hostfile /Network/CFD/hello/hello_world > [powerbook.2-net:00606] [0,0,0] setting up session dir with > [powerbook.2-net:00606] universe default-universe > [powerbook.2-net:00606] user motte > [powerbook.2-net:00606]
Re: [OMPI users] mca_btl_tcp_frag_send: writev failed with errno=110
Tony -- My apologies for taking so long to answer. :-( I was unfortunately unable to replicate your problem. I ran your source code across 32 machines connected by TCP with no problem: mpirun --hostfile ~/mpi/cdc -np 32 -mca btl tcp,self netbench 8 I tried this on two different clusters with the same results -- it didn't hang. :-( Can you try again with a recent nightly tarball, or the 1.1.1 beta tarball that has been posted? http://www.open-mpi.org/software/ompi/v1.1/ On 6/30/06 8:35 AM, "Tony Ladd" wrote: > Jeff > > Thanks for the reply; I realize you guys must be really busy with the recent > release of openmpi. I tried 1.1 and I don't get error messages any more. But > the code now hangs; no error or exit. So I am not sure if this is the same > issue or something else. I am enclosing my source code. I compiled with icc > and linked against an icc compiled version of openmpi-1.1. > > My program is a set of network benchmarks (a crude kind of netpipe) that > checks typical message passing patterns in my application codes. > Typical output is: > > 32 CPU's: sync call time = 1003.0time > rate (Mbytes/s) bandwidth (MBits/s) > loop buffers size XC XE GS MS XC > XE GS MS XC XE GS MS >1 6416384 2.48e-02 1.99e-02 1.21e+00 3.88e-02 4.23e+01 > 5.28e+01 8.65e-01 2.70e+01 1.08e+04 1.35e+04 4.43e+02 1.38e+04 >2 6416384 2.17e-02 2.09e-02 1.21e+00 4.10e-02 4.82e+01 > 5.02e+01 8.65e-01 2.56e+01 1.23e+04 1.29e+04 4.43e+02 1.31e+04 >3 6416384 2.20e-02 1.99e-02 1.01e+00 3.95e-02 4.77e+01 > 5.27e+01 1.04e+00 2.65e+01 1.22e+04 1.35e+04 5.33e+02 1.36e+04 >4 6416384 2.16e-02 1.96e-02 1.25e+00 4.00e-02 4.85e+01 > 5.36e+01 8.37e-01 2.62e+01 1.24e+04 1.37e+04 4.28e+02 1.34e+04 >5 6416384 2.25e-02 2.00e-02 1.25e+00 4.07e-02 4.66e+01 > 5.24e+01 8.39e-01 2.57e+01 1.19e+04 1.34e+04 4.30e+02 1.32e+04 >6 6416384 2.19e-02 1.99e-02 1.29e+00 4.05e-02 4.79e+01 > 5.28e+01 8.14e-01 2.59e+01 1.23e+04 1.35e+04 4.17e+02 1.33e+04 >7 6416384 2.19e-02 2.06e-02 1.25e+00 4.03e-02 4.79e+01 > 5.09e+01 8.38e-01 2.60e+01 1.23e+04 1.30e+04 4.29e+02 1.33e+04 >8 6416384 2.24e-02 2.06e-02 1.25e+00 4.01e-02 4.69e+01 > 5.09e+01 8.39e-01 2.62e+01 1.20e+04 1.30e+04 4.30e+02 1.34e+04 >9 6416384 4.29e-01 2.01e-02 6.35e-01 3.98e-02 2.45e+00 > 5.22e+01 1.65e+00 2.64e+01 6.26e+02 1.34e+04 8.46e+02 1.35e+04 > 10 6416384 2.16e-02 2.06e-02 8.87e-01 4.00e-02 4.85e+01 > 5.09e+01 1.18e+00 2.62e+01 1.24e+04 1.30e+04 6.05e+02 1.34e+04 > > Time is total for all 64 buffers. Rate is one way across one link (# of > bytes/time). > 1) XC is a bidirectional ring exchange. Each processor sends to the right > and receives from the left > 2) XE is an edge exchange. Pairs of nodes exchange data, with each one > sending and receiving > 3) GS is the MPI_AllReduce > 4) MS is my version of MPI_AllReduce. It splits the vector into Np blocks > (Np is # of processors); each processor then acts as a head node for one > block. This uses the full bandwidth all the time, unlike AllReduce which > thins out as it gets to the top of the binary tree. On a 64 node Infiniband > system MS is about 5X faster than GS-in theory it would be 6X; ie log_2(64). > Here it is 25X-not sure why so much. But MS seems to be the cause of the > hangups with messages > 64K. I can run the other benchmarks OK,but this one > seems to hang for large messages. I think the problem is at least partly due > to the switch. All MS is doing is point to point communications, but > unfortunately it sometimes requires a high bandwidth between ASIC's. It > first it exchanges data between near neighbors in MPI_COMM_WORLD, but it > must progressively span wider gaps between nodes as it goes up the various > binary trees. After a while this requires extensive traffic between ASICS. > This seems to be a problem on both my HP 2724 and the Extreme Networks > Summit400t-48. I am currently working with Extreme to try to resolve the > switch issue. As I say; the code ran great on Infiniband, but I think those > switches have hardware flow control. Finally I checked the code again under > LAM and it ran OK. Slow, but no hangs. > > To run the code compile and type: > mpirun -np 32 -machinefile hosts src/netbench 8 > The 8 means 2^8 bytes (ie 256K). This was enough to hang every time on my > boxes. > > You can also edit the header file (header.h). MAX_LOOPS is how many times it > runs each test (currently 10); NUM_BUF is the number of buffers in each test > (must be more than number of processors), SYNC defines the global sync > frequency-every SYNC buffers. NUM_SYNC is the number of sequential barrier > calls it uses to determine the mean barrier call time. You can also switch > the verious te
[OMPI users] error while loading shared libraries: libmpi.so.0: cannot open shared object file
get the following error when I attempt to run an mpi program (called "first", in this case) across several nodes (it works on a single node): $ mpirun -np 3 --hostfile /tmp/nodes ./first ./first: error while loading shared libraries: libmpi.so.0: cannot open shared object file: No such file or directory My library path looks okay and I am able to run other programs, including listing the supposedly missing library: $ echo $LD_LIBRARY_PATH /opt/openmpi/1.1/lib/ $ mpirun -np 3 --hostfile /tmp/nodes uptime 16:42:51 up 22 days, 3:14, 10 users, load average: 0.01, 0.02, 0.04 19:49:32 up 1:36, 0 users, load average: 0.00, 0.00, 0.00 19:40:01 up 1:37, 0 users, load average: 0.00, 0.00, 0.00 $ mpirun -np 3 --hostfile /tmp/nodes ls -l /opt/openmpi/1.1/lib/libmpi.so* lrwxrwxrwx 1 root root 15 Jul 13 15:44 /opt/openmpi/1.1/lib/libmpi.so -> libmpi.so.0.0.0 lrwxrwxrwx 1 root root 15 Jul 13 15:44 /opt/openmpi/1.1/lib/libmpi.so.0 -> libmpi.so.0.0.0 -rwxr-xr-x 1 root root 6157698 Jul 12 18:08 /opt/openmpi/1.1/lib/libmpi.so.0.0.0 lrwxrwxrwx 1 root root 15 Jul 26 16:17 /opt/openmpi/1.1/lib/libmpi.so -> libmpi.so.0.0.0 lrwxrwxrwx 1 root root 15 Jul 26 16:17 /opt/openmpi/1.1/lib/libmpi.so.0 -> libmpi.so.0.0.0 -rwxr-xr-x 1 root root 6157698 Jul 12 18:08 /opt/openmpi/1.1/lib/libmpi.so.0.0.0 lrwxrwxrwx 1 root root 15 Jul 26 13:50 /opt/openmpi/1.1/lib/libmpi.so -> libmpi.so.0.0.0 lrwxrwxrwx 1 root root 15 Jul 26 13:50 /opt/openmpi/1.1/lib/libmpi.so.0 -> libmpi.so.0.0.0 -rwxr-xr-x 1 root root 6157698 Jul 12 18:08 /opt/openmpi/1.1/lib/libmpi.so.0.0.0 Any suggestions? Thanks, Dan
Re: [OMPI users] error while loading shared libraries: libmpi.so.0: cannot open shared object file
A few notes: 1. I'm guessing that your LD_LIBRARY_PATH is not set properly on the remote nodes, which is why it can't find libmpi.so on the remote nodes. Ensure that it's set properly on the other side (you'll likely need to modify your shell startup files), or use the --prefix functionality in mpirun (which will ensure to set your PATH and LD_LIBRARY_PATH properly on remote nodes), like this: mpirun --prefix /opt/openmpi/1.1 -np 3 --hostfile /tmp/hosts ./first Or simply supply the full pathname to mpirun (exactly equivalent to --prefix): /opt/openmpi/1.1/bin/mpirun -np 3 --hostfile /tmp/hosts ./first Or if you're lazy (like me): `which mpirun` -np 3 --hostfile /tmp/hosts ./first 2. Note that your "ls" command was actually shell expanded on the node where you ran mpirun, and *then* it was executed on the remote nodes. This was not a problem because the files are actually the same on all nodes, but I thought you might want to know that for future reference. Hope that helps! On 7/28/06 4:55 PM, "Dan Lipsitt" wrote: > get the following error when I attempt to run an mpi program (called > "first", in this case) across several nodes (it works on a single > node): > > $ mpirun -np 3 --hostfile /tmp/nodes ./first > ./first: error while loading shared libraries: libmpi.so.0: cannot > open shared object file: No such file or directory > > My library path looks okay and I am able to run other programs, > including listing the supposedly missing library: > > $ echo $LD_LIBRARY_PATH > /opt/openmpi/1.1/lib/ > $ mpirun -np 3 --hostfile /tmp/nodes uptime > 16:42:51 up 22 days, 3:14, 10 users, load average: 0.01, 0.02, 0.04 > 19:49:32 up 1:36, 0 users, load average: 0.00, 0.00, 0.00 > 19:40:01 up 1:37, 0 users, load average: 0.00, 0.00, 0.00 > $ mpirun -np 3 --hostfile /tmp/nodes ls -l /opt/openmpi/1.1/lib/libmpi.so* > lrwxrwxrwx 1 root root 15 Jul 13 15:44 > /opt/openmpi/1.1/lib/libmpi.so -> libmpi.so.0.0.0 > lrwxrwxrwx 1 root root 15 Jul 13 15:44 > /opt/openmpi/1.1/lib/libmpi.so.0 -> libmpi.so.0.0.0 > -rwxr-xr-x 1 root root 6157698 Jul 12 18:08 > /opt/openmpi/1.1/lib/libmpi.so.0.0.0 > lrwxrwxrwx 1 root root 15 Jul 26 16:17 > /opt/openmpi/1.1/lib/libmpi.so -> libmpi.so.0.0.0 > lrwxrwxrwx 1 root root 15 Jul 26 16:17 > /opt/openmpi/1.1/lib/libmpi.so.0 -> libmpi.so.0.0.0 > -rwxr-xr-x 1 root root 6157698 Jul 12 18:08 > /opt/openmpi/1.1/lib/libmpi.so.0.0.0 > lrwxrwxrwx 1 root root 15 Jul 26 13:50 > /opt/openmpi/1.1/lib/libmpi.so -> libmpi.so.0.0.0 > lrwxrwxrwx 1 root root 15 Jul 26 13:50 > /opt/openmpi/1.1/lib/libmpi.so.0 -> libmpi.so.0.0.0 > -rwxr-xr-x 1 root root 6157698 Jul 12 18:08 > /opt/openmpi/1.1/lib/libmpi.so.0.0.0 > > Any suggestions? > > Thanks, > Dan > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Server Virtualization Business Unit Cisco Systems
Re: [OMPI users] Error sending large number of small messages
Marcelo -- Can you send your code that is failing? I'm unable to reproduce with some toy programs here. I also notice that you're running a somewhat old version of and OMPI SVN checkout of the trunk. Can you update to the most recent version? The trunk is not guaranteed to be stable, and we did have some stability problems recently -- you might want to upgrade to the most recent version (today seems to be ok) and/or try one of the nightly or prerelease tarballs in the 1.1 branch. On 7/26/06 6:18 PM, "Marcelo Stival" wrote: > Hi, > > I got a problem with ompi when sending large number of messages from > process A to process B. > Process A only send... and B only receive (the buffers are reused) > > int n = 4 * 1024;//number of iterations (messages to be sent) consecutively > int len = 8; //len of each message > > Process A (rank 0): > for (i=0; i < n; i++){ > MPI_Send( sbuffer, len, MPI_BYTE,to,i,MPI_COMM_WORLD); > } > Process B (rank 1): > for (i=0; i < n; i++){ > MPI_Recv(rbuffer,len,MPI_BYTE,recv_from , i,MPI_COMM_WORLD, &status); > } > (It's a benchmark program... will run with increasing messages sizes.. ) > (I tried with the same tag on all iterations - and got the same) > > It works fine for n (number of messages) equals to 3k (for example), but do > not work with n equals to 4k (for messages of 8 bytes 4k iterations seems to > be the treshould). > > The error messages (complete output attached): > malloc debug: Request for 8396964 bytes failed (class/ompi_free_list.c, 142) > mpptest: btl_tcp_endpoint.c:624: mca_btl_tcp_endpoint_recv_handler: > Assertion `0 > == btl_endpoint->endpoint_cache_length' failed. > Signal:6 info.si_errno:0(Success) si_code:-6() > > > Considerations: > It works for synchronous send (MPI_Ssend). > It works with MPICH2 ( 1.0.3). > It is a benchmark program, I want to flood the network to measure the > bandwidth ... (for different message sizes) > > > Thanks > > Marcelo > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Server Virtualization Business Unit Cisco Systems
Re: [OMPI users] Open-MPI running os SMP cluster
On 7/26/06 5:55 PM, "Michael Kluskens" wrote: >> How is the message passing of Open-MPI implemented when I have >> say 4 nodes with 4 processors (SMP) each, nodes connected by a gigabit >> ethernet ?... in other words, how does it manage SMP nodes when I >> want to use all CPUs, but each with its own process. Does it take >> any advantage of the SMP at each node? > > Someone can give you a more complete/correct answer but I'll give you > my understanding. > > All communication in OpenMPI is handled via the MCA module (term?) We call them "components" or "plugins"; a "module" is typically an instance of those plugins (e.g., if you have 2 ethernet NICs with TCP interfaces, you'll get 2 instances -- modules -- of the TCP BTL component). > self - process communicating with itself > sm - ... via shared memory to other processes > tcp - ... via tcp > openib - ... via Infiniband OpenIB stack > gm & mx - ... via Myrinet GM/MX > mvapi - ... via Infiniband Mellanox Verbs All correct. > If you launch your process so that four processes are on a node then > those would use shared memory to communicate. Also correct. Just chiming in with verifications! :-) -- Jeff Squyres Server Virtualization Business Unit Cisco Systems
Re: [OMPI users] Runtime Error
This question has come up a few times now, so I've added it to the faq, which should make the "mca_pml_teg.so:undefined symbol" message web-searchable for others who run into this issue. On 7/26/06 8:36 AM, "Michael Kluskens" wrote: > Summary: You have to properly uninstall OpenMPI 1.0.2 before > installing OpenMPI 1.1 > > > On Jul 26, 2006, at 7:05 AM, wrote: > >> Updated to open_mpi-1.1. I get a runtime error on the application as >> follows >> >> mca:base:component_find:unable to >> open:/usr/local/lip/openmpi/mca_pml_teg.so:undefined >> symbol:mca_ptl_base_modules_initialized >> >> Open_mpi is compile with g95 and gcc 4.0.3 > > I use that combination all the time on OS X 10.4.7 and under Debian > Sarge. > > Since you did not specify how you updated to OpenMPI 1.1 I'm copying > the instructions posted previously on the list: > > > On Jun 26, 2006, at 5:56 PM, Benjamin Landsteiner wrote: >> Strange. I had actually done this before I emailed (several times, >> in fact), but for the sake of completeness, I did it once more. This >> time, it worked! No clue why it worked this time around. >> >> For those of you who in the future come across this problem, here are >> the (more or less exact) steps I took to recover from the problem: >> >> PROBLEM: You installed v1.1 of Open MPI and experience keyval parse >> errors upon running mpicc, mpif77, mpic++, and so forth. >> >> SOLUTION: >> 1. Go to your v1.1 directory, and type './configure' if you have not >> already done so >> 2. Type 'make uninstall' >> 3. Go to your v1.0.2 directory, and reconfigure using the same >> settings as you installed with (if you still have the install >> directory, you probably don't need to do this as it has already been >> configured) >> 4. In the v1.0.2 directory, type 'make uninstall' >> 5. For good measure, I went back to the v1.1 directory and typed >> 'make uninstall' again >> 6. Find lingering Open MPI directories and files and then delete >> them (for instance, empty Open MPI-related folders remained in my / >> usr/local/* directories) >> 7. At this point, I restarted my machine. Not sure if it's >> necessary or not. >> 8. Go back to the v1.1 directory. Type 'make clean', then >> reconfigure, then recompile and reinstall >> 9. Things should work now. >> >> >> Thank you Michael, >> ~Ben >> >> ++ >> Benjamin Landsteiner >> lands...@stolaf.edu >> >> On 2006/06/26, at 3:48 PM, Michael Kluskens wrote: >> >>> You may have to properly uninstall OpenMPI 1.0.2 before installing >>> OpenMPI 1.1 >>> >>> This was an issue in the past. >>> >>> I would recommend you go into your OpenMPI 1.1 directory and type >>> "make uninstall", then if you have it go into your OpenMPI 1.0.2 >>> directory and do the same. If you don't have a directory with >>> OpenMPI 1.0.2 configured already then either rebuild OpenMPI 1.0.2 or >>> go into /usr/local and identify all remaining OpenMPI directories and >>> components and remove them. Basically you should find directories >>> modified when you installed OpenMPI 1.1 (or when you uninstalled it) >>> and you may find components dated from when you installed OpenMPI >>> 1.0.2. >>> >>> Michael >>> >>> On Jun 26, 2006, at 4:34 PM, Benjamin Landsteiner wrote: >>> Hello all, I recently upgraded to v1.1 of Open MPI and ran into a problem on my head node that I can't seem to solve. Upon running mpicc, mpiCC, mpic++, and so forth, I get an error like this: >>> > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Server Virtualization Business Unit Cisco Systems
Re: [OMPI users] Fault Tolerant Method
Actually, we had a problem in our implementation that caused the system to continually reuse the same machine allocations for each "spawn" request. In other words, we always started with the top of the machine_list whenever your program called comm_spawn. This appears to have been the source of the behavior you describe. You don't need to use the MPI_Info key to solve that problem - it has been fixed in the subversion repository, and will be included in the next release. If all you want is to have your new processes be placed beginning with the next process slot in your allocation (as opposed to overlaying the existing processes), then you don't need to do anything. On the other hand, if you want the new processes to go to a specific set of hosts, then you need to follow Josh's suggestions. Hope that helps Ralph On 7/28/06 8:38 AM, "Josh Hursey" wrote: >> I have implemented the fault tolerance method in which you would use >> MPI_COMM_SPAWN to dynamically create communication groups and use >> those communicators for a form of process fault tolerance (as >> described by William Gropp and Ewing Lusk in their 2004 paper), >> but am having some problems getting it to work the way I intended. >> Basically, when it runs, it is spawning all the processes on the >> same machine (as it always starts at the top of the machine_list >> when spawning a process). Is there a way that I get get these >> processes to spawn on different machines? >> > > In Open MPI (and most other MPI implementations) you will be restricted to > using only the machines in your allocation when you use MPI_Comm_spawn*. > The standard allows you can suggest to MPI_Comm_spawn where to place the > 'children' that it creates using the MPI_Info key -- specifically the > {host} keyvalue referenced here: > http://www.mpi-forum.org/docs/mpi-20-html/node97.htm#Node97 > MPI_Info is described here: > http://www.mpi-forum.org/docs/mpi-20-html/node53.htm#Node53 > > Open MPI, in the current release, does not do anything with this key. > This has been fixed in subversion (as of r11039) and will be in the next > release of Open MPI. > > If you want to use this functionality in the near term I would suggest > using the nightly build of the subversion trunk available here: > http://www.open-mpi.org/nightly/trunk/ > > >> One possible route I considerd was using something like SLURM to >> distribute the jobs, and just putting '+' in the machine file. Will >> this work? Is this the best route to go? > > Off the top of my head, I'm not sure if that would work of not. The > best/cleanest route would be to use the MPI_Info command and the {host} > key. > > Let us know if you have any trouble with MPI_Comm_spawn or MPI_Info in > this scenario. > > Hope that helps, > Josh > >> >> Thanks for any help with this. >> >> Byron >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users