In addition to what Ralph said (just install OMPI under your $HOME, at least 
for testing purposes), here's what we say about version compatibility:

1. OMPI started providing ABI guarantees with v1.3.2.  The ABI guarantee we 
provide is that a 1.x and 1.(x+1) series will be ABI compatible, where x is 
odd.  For example, you can compile against 1.5.x and still mpirun with a 1.6.x 
installation (assuming you built with shared libraries, yadda yadda yadda).

2. We have never provided any guarantees about compatibility between different 
versions of OMPI (even with a 1.x series).  Meaning: if you run version a.b.c 
on one server, you should run a.b.c on *all* servers in your job.  Wire line 
compatibility is NOT guaranteed, and will likely break in either very obnoxious 
or very subtle ways.  Both are bad.

However, per the just-install-a-copy-in-your-$HOME advice, you can have N 
different OMPI installations if you really want to.  Just ensure that your PATH 
and LD_LIBRARY_PATH point to the *one* that you want to use -- both on the 
current server and all servers that you're using in a given job.  And that 
works fine (I do that all the time -- I have something like 20-30 OMPI installs 
under my $HOME, all in various stages of development/debugging; I just updated 
my PATH / LD_LIBARY_PATH and I'm good to go).

Make sense?


On Feb 6, 2014, at 1:23 PM, Ross Boylan <r...@biostat.ucsf.edu> wrote:

> On 2/6/2014 3:24 AM, Jeff Squyres (jsquyres) wrote:
>> Have you tried upgrading to a newer version of Open MPI?  The 1.4.x series 
>> is several generations old.  Open MPI 1.7.4 was just released yesterday.
> It's on a cluster running Debian squeeze, with perhaps some upgrades to 
> wheezy coming.  However, even wheezy is at 1.4.5 (the next generation is 
> currently at 1.6.5).  I don't administer the cluster, and upgrading basic 
> infrastructure seems somewhat hazardous.
> 
> I checked for backports of more recent version (at backports.debian.org) but 
> there don't seem to be any for squeeze or wheezy.
> 
> Can we mix later an earlier versions of MPI?  The documentation at 
> http://www.open-mpi.org/software/ompi/versions/ seems to indicate that 1.4, 
> 1.6 and 1.7 would all be binary incompatible, though 1.5 and 1.6, or 1.7 and 
> 1.8 would be compatible.   However, point 10 of the FAQ 
> (http://www.open-mpi.org/faq/?category=sysadmin#new-openmpi-version) seems to 
> say compatibility is broader.
> 
> Also, the documents don't seem to address on-the-wire compatibility; that is, 
> if nodes on are different versions, can they work together reliably?
> 
> Thanks.
> Ross
>> 
>> 
>> On Feb 5, 2014, at 9:58 PM, Ross Boylan <r...@biostat.ucsf.edu> wrote:
>> 
>>> On 1/31/2014 1:08 PM, Ross Boylan wrote:
>>>> I am getting the following error, amidst many successful message sends:
>>>> [n10][[50048,1],1][../../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:118:mca_btl_tcp_frag_send]
>>>>  mca_btl_tcp_frag_send: writev error (0x7f6155970038, 578659815)
>>>>         Bad address(1)
>>>> 
>>> I think I've tracked down the immediate cause: I was sending a very large 
>>> object (from R--I assume serialized into a byte stream) that was over 3G.  
>>> I'm not sure why it would produce that error, but it doesn't seem that 
>>> surprising that something would go wrong.
>>> 
>>> Ross
>>>> Any ideas about what is going on or what I can do to fix it?
>>>> 
>>>> I am using the openmpi-bin 1.4.2-4 Debian package on a cluster running 
>>>> Debian squeeze.
>>>> 
>>>> I couldn't find a config.log file; there is 
>>>> /etc/openmpi/openmpi-mca-params.conf, which is completely commented out.
>>>> 
>>>> Invocation is from R 3.0.1 (debian package) with Rmpi 0.6.3 built by me 
>>>> from source in a local directory. My sends all use mpi.isend.Robj and the 
>>>> receives use mpi.recv.Robj, both from the Rmpi library.
>>>> 
>>>> The jobs were started with rmpilaunch; it and the hosts file are included 
>>>> in the attachments. TCP connections.  rmpilaunch leaves me in an R session 
>>>> on the master.  I invoked the code inside the toplevel() function toward 
>>>> the bottom of dbox-master.R.
>>>> 
>>>> The program source files and other background information is in the 
>>>> attached file.    n10 has the output of ompi_info --all, and n1011 has 
>>>> other info for both nodes that were active (n10 was master; n11 had some 
>>>> slaves).
>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> 
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to