[OMPI users] Open MPI backwards incompatibility issue in statically linked program

2016-02-13 Thread Kim Walisch
Hi,

In order to make life of my users easier I have built a fully
statically linked version of my primecount program. So the program
also statically links against Open MPI. I have built this binary on
CentOS-7-x86_64 using gcc. The good news is that the binary runs
without any issues on Ubuntu 15.10 x64 (uses mpiexec (OpenRTE) 1.10.2).

The bad news is that the binary does not work on Ubuntu 14.04 x64
which uses mpiexec (OpenRTE) 1.6.5. Here is the error message:


$ mpirun -n 1 ./primecount 1e10 -t1
[ip-XXX:02671] [[8243,0],0] mca_oob_tcp_recv_handler: invalid message type:
15
--
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--
ubuntu@ip-XXX:~$ mpiexec --version
mpiexec (OpenRTE) 1.6.5


Questions:

1) Is this backwards incompatibility issue an Open MPI bug?

2) Can I expect that my binary will work with future mpiexec
versions >= 1.10 (which it was built with)?

Thanks and best regards,
Kim


Re: [OMPI users] Open MPI backwards incompatibility issue in statically linked program

2016-02-13 Thread Nick Papior
You may be interested in reading:
https://www.open-mpi.org/software/ompi/versions/

2016-02-13 22:30 GMT+01:00 Kim Walisch :

> Hi,
>
> In order to make life of my users easier I have built a fully
> statically linked version of my primecount program. So the program
> also statically links against Open MPI. I have built this binary on
> CentOS-7-x86_64 using gcc. The good news is that the binary runs
> without any issues on Ubuntu 15.10 x64 (uses mpiexec (OpenRTE) 1.10.2).
>
> The bad news is that the binary does not work on Ubuntu 14.04 x64
> which uses mpiexec (OpenRTE) 1.6.5. Here is the error message:
>
>
> $ mpirun -n 1 ./primecount 1e10 -t1
> [ip-XXX:02671] [[8243,0],0] mca_oob_tcp_recv_handler: invalid message
> type: 15
> --
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --
> ubuntu@ip-XXX:~$ mpiexec --version
> mpiexec (OpenRTE) 1.6.5
>
>
> Questions:
>
> 1) Is this backwards incompatibility issue an Open MPI bug?
>
> 2) Can I expect that my binary will work with future mpiexec
> versions >= 1.10 (which it was built with)?
>
> Thanks and best regards,
> Kim
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/02/28522.php
>



-- 
Kind regards Nick


Re: [OMPI users] Open MPI backwards incompatibility issue in statically linked program

2016-02-13 Thread Kim Walisch
Hi,

> You may be interested in reading:
https://www.open-mpi.org/software/ompi/versions/

Thanks for your answer. I found:

> Specifically: v1.10.x is not guaranteed to be backwards
compatible with other v1.x releases.

And:

> However, this definition only applies when the same version of Open
MPI is used with all instances ... If the versions are not exactly the
same everywhere, Open MPI is not guaranteed to work properly in any
scenario.

So statically linking against Open MPI seems to be a bad idea.

What about linking against a rather old shared Open MPI library
from e.g. 3 years ago? Will my program likely run on most systems
which have a more recent Open MPI version installed?

Or is it better to not distribute any binaries which link against Open MPI
and instead put compilation instructions on the website?

Thanks,
Kim

On Sat, Feb 13, 2016 at 10:45 PM, Nick Papior  wrote:

> You may be interested in reading:
> https://www.open-mpi.org/software/ompi/versions/
>
> 2016-02-13 22:30 GMT+01:00 Kim Walisch :
>
>> Hi,
>>
>> In order to make life of my users easier I have built a fully
>> statically linked version of my primecount program. So the program
>> also statically links against Open MPI. I have built this binary on
>> CentOS-7-x86_64 using gcc. The good news is that the binary runs
>> without any issues on Ubuntu 15.10 x64 (uses mpiexec (OpenRTE) 1.10.2).
>>
>> The bad news is that the binary does not work on Ubuntu 14.04 x64
>> which uses mpiexec (OpenRTE) 1.6.5. Here is the error message:
>>
>>
>> $ mpirun -n 1 ./primecount 1e10 -t1
>> [ip-XXX:02671] [[8243,0],0] mca_oob_tcp_recv_handler: invalid message
>> type: 15
>> --
>> mpirun noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --
>> ubuntu@ip-XXX:~$ mpiexec --version
>> mpiexec (OpenRTE) 1.6.5
>>
>>
>> Questions:
>>
>> 1) Is this backwards incompatibility issue an Open MPI bug?
>>
>> 2) Can I expect that my binary will work with future mpiexec
>> versions >= 1.10 (which it was built with)?
>>
>> Thanks and best regards,
>> Kim
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/02/28522.php
>>
>
>
>
> --
> Kind regards Nick
>


Re: [OMPI users] Open MPI backwards incompatibility issue in statically linked program

2016-02-13 Thread Nick Papior
2016-02-13 23:07 GMT+01:00 Kim Walisch :

> Hi,
>
> > You may be interested in reading:
> https://www.open-mpi.org/software/ompi/versions/
>
> Thanks for your answer. I found:
>
> > Specifically: v1.10.x is not guaranteed to be backwards
> compatible with other v1.x releases.
>
> And:
>
> > However, this definition only applies when the same version of Open
> MPI is used with all instances ... If the versions are not exactly the
> same everywhere, Open MPI is not guaranteed to work properly in any
> scenario.
>
> So statically linking against Open MPI seems to be a bad idea.
>
> What about linking against a rather old shared Open MPI library
> from e.g. 3 years ago? Will my program likely run on most systems
> which have a more recent Open MPI version installed?
>
Most probably this will rarely work. If it does, you are lucky... :)
The link still applies. As it says, if it works it works, if not you have
to do something else.

>
> Or is it better to not distribute any binaries which link against Open MPI
> and instead put compilation instructions on the website?
>
Yes, and/or provide serial equivalents of your program.
Besides, providing binaries for specific MPI-implementations may seem like
easing it for users, however in my experience many users do not know that
MPI is implementation specific, i.e. OpenMPI and MPICH and hence they will
subsequently ask why it doesn't work using an intel-suite of compilers (for
instance).

>
> Thanks,
> Kim
>
> On Sat, Feb 13, 2016 at 10:45 PM, Nick Papior 
> wrote:
>
>> You may be interested in reading:
>> https://www.open-mpi.org/software/ompi/versions/
>>
>> 2016-02-13 22:30 GMT+01:00 Kim Walisch :
>>
>>> Hi,
>>>
>>> In order to make life of my users easier I have built a fully
>>> statically linked version of my primecount program. So the program
>>> also statically links against Open MPI. I have built this binary on
>>> CentOS-7-x86_64 using gcc. The good news is that the binary runs
>>> without any issues on Ubuntu 15.10 x64 (uses mpiexec (OpenRTE) 1.10.2).
>>>
>>> The bad news is that the binary does not work on Ubuntu 14.04 x64
>>> which uses mpiexec (OpenRTE) 1.6.5. Here is the error message:
>>>
>>>
>>> $ mpirun -n 1 ./primecount 1e10 -t1
>>> [ip-XXX:02671] [[8243,0],0] mca_oob_tcp_recv_handler: invalid message
>>> type: 15
>>>
>>> --
>>> mpirun noticed that the job aborted, but has no info as to the process
>>> that caused that situation.
>>>
>>> --
>>> ubuntu@ip-XXX:~$ mpiexec --version
>>> mpiexec (OpenRTE) 1.6.5
>>>
>>>
>>> Questions:
>>>
>>> 1) Is this backwards incompatibility issue an Open MPI bug?
>>>
>>> 2) Can I expect that my binary will work with future mpiexec
>>> versions >= 1.10 (which it was built with)?
>>>
>>> Thanks and best regards,
>>> Kim
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2016/02/28522.php
>>>
>>
>>
>>
>> --
>> Kind regards Nick
>>
>
>


-- 
Kind regards Nick


Re: [OMPI users] Open MPI backwards incompatibility issue in statically linked program

2016-02-13 Thread Jeff Hammond
On Sat, Feb 13, 2016 at 2:27 PM, Nick Papior  wrote:

>
>
> 2016-02-13 23:07 GMT+01:00 Kim Walisch :
>
>> Hi,
>>
>> > You may be interested in reading:
>> https://www.open-mpi.org/software/ompi/versions/
>>
>> Thanks for your answer. I found:
>>
>> > Specifically: v1.10.x is not guaranteed to be backwards
>> compatible with other v1.x releases.
>>
>> And:
>>
>> > However, this definition only applies when the same version of Open
>> MPI is used with all instances ... If the versions are not exactly the
>> same everywhere, Open MPI is not guaranteed to work properly in any
>> scenario.
>>
>> So statically linking against Open MPI seems to be a bad idea.
>>
>> What about linking against a rather old shared Open MPI library
>> from e.g. 3 years ago? Will my program likely run on most systems
>> which have a more recent Open MPI version installed?
>>
> Most probably this will rarely work. If it does, you are lucky... :)
> The link still applies. As it says, if it works it works, if not you have
> to do something else.
>
>>
>> Or is it better to not distribute any binaries which link against Open MPI
>> and instead put compilation instructions on the website?
>>
> Yes, and/or provide serial equivalents of your program.
> Besides, providing binaries for specific MPI-implementations may seem like
> easing it for users, however in my experience many users do not know that
> MPI is implementation specific, i.e. OpenMPI and MPICH and hence they will
> subsequently ask why it doesn't work using an intel-suite of compilers (for
> instance).
>
>>
>>
You can rely upon e.g. https://www.mpich.org/abi/ when redistributing MPI
binaries built with MPICH, but a better option would be to wrap all of your
MPI code in an implementation-agnostic wrapper and then ship a binary that
can dlopen a different version wrapper depending on which MPI
implementation the user has.  That would allow you to ship a single binary
that could use both MPICH and OpenMPI.

Jeff


> Thanks,
>> Kim
>>
>> On Sat, Feb 13, 2016 at 10:45 PM, Nick Papior 
>> wrote:
>>
>>> You may be interested in reading:
>>> https://www.open-mpi.org/software/ompi/versions/
>>>
>>> 2016-02-13 22:30 GMT+01:00 Kim Walisch :
>>>
 Hi,

 In order to make life of my users easier I have built a fully
 statically linked version of my primecount program. So the program
 also statically links against Open MPI. I have built this binary on
 CentOS-7-x86_64 using gcc. The good news is that the binary runs
 without any issues on Ubuntu 15.10 x64 (uses mpiexec (OpenRTE) 1.10.2).

 The bad news is that the binary does not work on Ubuntu 14.04 x64
 which uses mpiexec (OpenRTE) 1.6.5. Here is the error message:


 $ mpirun -n 1 ./primecount 1e10 -t1
 [ip-XXX:02671] [[8243,0],0] mca_oob_tcp_recv_handler: invalid message
 type: 15

 --
 mpirun noticed that the job aborted, but has no info as to the process
 that caused that situation.

 --
 ubuntu@ip-XXX:~$ mpiexec --version
 mpiexec (OpenRTE) 1.6.5


 Questions:

 1) Is this backwards incompatibility issue an Open MPI bug?

 2) Can I expect that my binary will work with future mpiexec
 versions >= 1.10 (which it was built with)?

 Thanks and best regards,
 Kim

 ___
 users mailing list
 us...@open-mpi.org
 Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
 Link to this post:
 http://www.open-mpi.org/community/lists/users/2016/02/28522.php

>>>
>>>
>>>
>>> --
>>> Kind regards Nick
>>>
>>
>>
>
>
> --
> Kind regards Nick
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/02/28525.php
>



-- 
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/