[OMPI users] Open MPI backwards incompatibility issue in statically linked program
Hi, In order to make life of my users easier I have built a fully statically linked version of my primecount program. So the program also statically links against Open MPI. I have built this binary on CentOS-7-x86_64 using gcc. The good news is that the binary runs without any issues on Ubuntu 15.10 x64 (uses mpiexec (OpenRTE) 1.10.2). The bad news is that the binary does not work on Ubuntu 14.04 x64 which uses mpiexec (OpenRTE) 1.6.5. Here is the error message: $ mpirun -n 1 ./primecount 1e10 -t1 [ip-XXX:02671] [[8243,0],0] mca_oob_tcp_recv_handler: invalid message type: 15 -- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -- ubuntu@ip-XXX:~$ mpiexec --version mpiexec (OpenRTE) 1.6.5 Questions: 1) Is this backwards incompatibility issue an Open MPI bug? 2) Can I expect that my binary will work with future mpiexec versions >= 1.10 (which it was built with)? Thanks and best regards, Kim
Re: [OMPI users] Open MPI backwards incompatibility issue in statically linked program
You may be interested in reading: https://www.open-mpi.org/software/ompi/versions/ 2016-02-13 22:30 GMT+01:00 Kim Walisch : > Hi, > > In order to make life of my users easier I have built a fully > statically linked version of my primecount program. So the program > also statically links against Open MPI. I have built this binary on > CentOS-7-x86_64 using gcc. The good news is that the binary runs > without any issues on Ubuntu 15.10 x64 (uses mpiexec (OpenRTE) 1.10.2). > > The bad news is that the binary does not work on Ubuntu 14.04 x64 > which uses mpiexec (OpenRTE) 1.6.5. Here is the error message: > > > $ mpirun -n 1 ./primecount 1e10 -t1 > [ip-XXX:02671] [[8243,0],0] mca_oob_tcp_recv_handler: invalid message > type: 15 > -- > mpirun noticed that the job aborted, but has no info as to the process > that caused that situation. > -- > ubuntu@ip-XXX:~$ mpiexec --version > mpiexec (OpenRTE) 1.6.5 > > > Questions: > > 1) Is this backwards incompatibility issue an Open MPI bug? > > 2) Can I expect that my binary will work with future mpiexec > versions >= 1.10 (which it was built with)? > > Thanks and best regards, > Kim > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/02/28522.php > -- Kind regards Nick
Re: [OMPI users] Open MPI backwards incompatibility issue in statically linked program
Hi, > You may be interested in reading: https://www.open-mpi.org/software/ompi/versions/ Thanks for your answer. I found: > Specifically: v1.10.x is not guaranteed to be backwards compatible with other v1.x releases. And: > However, this definition only applies when the same version of Open MPI is used with all instances ... If the versions are not exactly the same everywhere, Open MPI is not guaranteed to work properly in any scenario. So statically linking against Open MPI seems to be a bad idea. What about linking against a rather old shared Open MPI library from e.g. 3 years ago? Will my program likely run on most systems which have a more recent Open MPI version installed? Or is it better to not distribute any binaries which link against Open MPI and instead put compilation instructions on the website? Thanks, Kim On Sat, Feb 13, 2016 at 10:45 PM, Nick Papior wrote: > You may be interested in reading: > https://www.open-mpi.org/software/ompi/versions/ > > 2016-02-13 22:30 GMT+01:00 Kim Walisch : > >> Hi, >> >> In order to make life of my users easier I have built a fully >> statically linked version of my primecount program. So the program >> also statically links against Open MPI. I have built this binary on >> CentOS-7-x86_64 using gcc. The good news is that the binary runs >> without any issues on Ubuntu 15.10 x64 (uses mpiexec (OpenRTE) 1.10.2). >> >> The bad news is that the binary does not work on Ubuntu 14.04 x64 >> which uses mpiexec (OpenRTE) 1.6.5. Here is the error message: >> >> >> $ mpirun -n 1 ./primecount 1e10 -t1 >> [ip-XXX:02671] [[8243,0],0] mca_oob_tcp_recv_handler: invalid message >> type: 15 >> -- >> mpirun noticed that the job aborted, but has no info as to the process >> that caused that situation. >> -- >> ubuntu@ip-XXX:~$ mpiexec --version >> mpiexec (OpenRTE) 1.6.5 >> >> >> Questions: >> >> 1) Is this backwards incompatibility issue an Open MPI bug? >> >> 2) Can I expect that my binary will work with future mpiexec >> versions >= 1.10 (which it was built with)? >> >> Thanks and best regards, >> Kim >> >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/02/28522.php >> > > > > -- > Kind regards Nick >
Re: [OMPI users] Open MPI backwards incompatibility issue in statically linked program
2016-02-13 23:07 GMT+01:00 Kim Walisch : > Hi, > > > You may be interested in reading: > https://www.open-mpi.org/software/ompi/versions/ > > Thanks for your answer. I found: > > > Specifically: v1.10.x is not guaranteed to be backwards > compatible with other v1.x releases. > > And: > > > However, this definition only applies when the same version of Open > MPI is used with all instances ... If the versions are not exactly the > same everywhere, Open MPI is not guaranteed to work properly in any > scenario. > > So statically linking against Open MPI seems to be a bad idea. > > What about linking against a rather old shared Open MPI library > from e.g. 3 years ago? Will my program likely run on most systems > which have a more recent Open MPI version installed? > Most probably this will rarely work. If it does, you are lucky... :) The link still applies. As it says, if it works it works, if not you have to do something else. > > Or is it better to not distribute any binaries which link against Open MPI > and instead put compilation instructions on the website? > Yes, and/or provide serial equivalents of your program. Besides, providing binaries for specific MPI-implementations may seem like easing it for users, however in my experience many users do not know that MPI is implementation specific, i.e. OpenMPI and MPICH and hence they will subsequently ask why it doesn't work using an intel-suite of compilers (for instance). > > Thanks, > Kim > > On Sat, Feb 13, 2016 at 10:45 PM, Nick Papior > wrote: > >> You may be interested in reading: >> https://www.open-mpi.org/software/ompi/versions/ >> >> 2016-02-13 22:30 GMT+01:00 Kim Walisch : >> >>> Hi, >>> >>> In order to make life of my users easier I have built a fully >>> statically linked version of my primecount program. So the program >>> also statically links against Open MPI. I have built this binary on >>> CentOS-7-x86_64 using gcc. The good news is that the binary runs >>> without any issues on Ubuntu 15.10 x64 (uses mpiexec (OpenRTE) 1.10.2). >>> >>> The bad news is that the binary does not work on Ubuntu 14.04 x64 >>> which uses mpiexec (OpenRTE) 1.6.5. Here is the error message: >>> >>> >>> $ mpirun -n 1 ./primecount 1e10 -t1 >>> [ip-XXX:02671] [[8243,0],0] mca_oob_tcp_recv_handler: invalid message >>> type: 15 >>> >>> -- >>> mpirun noticed that the job aborted, but has no info as to the process >>> that caused that situation. >>> >>> -- >>> ubuntu@ip-XXX:~$ mpiexec --version >>> mpiexec (OpenRTE) 1.6.5 >>> >>> >>> Questions: >>> >>> 1) Is this backwards incompatibility issue an Open MPI bug? >>> >>> 2) Can I expect that my binary will work with future mpiexec >>> versions >= 1.10 (which it was built with)? >>> >>> Thanks and best regards, >>> Kim >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/02/28522.php >>> >> >> >> >> -- >> Kind regards Nick >> > > -- Kind regards Nick
Re: [OMPI users] Open MPI backwards incompatibility issue in statically linked program
On Sat, Feb 13, 2016 at 2:27 PM, Nick Papior wrote: > > > 2016-02-13 23:07 GMT+01:00 Kim Walisch : > >> Hi, >> >> > You may be interested in reading: >> https://www.open-mpi.org/software/ompi/versions/ >> >> Thanks for your answer. I found: >> >> > Specifically: v1.10.x is not guaranteed to be backwards >> compatible with other v1.x releases. >> >> And: >> >> > However, this definition only applies when the same version of Open >> MPI is used with all instances ... If the versions are not exactly the >> same everywhere, Open MPI is not guaranteed to work properly in any >> scenario. >> >> So statically linking against Open MPI seems to be a bad idea. >> >> What about linking against a rather old shared Open MPI library >> from e.g. 3 years ago? Will my program likely run on most systems >> which have a more recent Open MPI version installed? >> > Most probably this will rarely work. If it does, you are lucky... :) > The link still applies. As it says, if it works it works, if not you have > to do something else. > >> >> Or is it better to not distribute any binaries which link against Open MPI >> and instead put compilation instructions on the website? >> > Yes, and/or provide serial equivalents of your program. > Besides, providing binaries for specific MPI-implementations may seem like > easing it for users, however in my experience many users do not know that > MPI is implementation specific, i.e. OpenMPI and MPICH and hence they will > subsequently ask why it doesn't work using an intel-suite of compilers (for > instance). > >> >> You can rely upon e.g. https://www.mpich.org/abi/ when redistributing MPI binaries built with MPICH, but a better option would be to wrap all of your MPI code in an implementation-agnostic wrapper and then ship a binary that can dlopen a different version wrapper depending on which MPI implementation the user has. That would allow you to ship a single binary that could use both MPICH and OpenMPI. Jeff > Thanks, >> Kim >> >> On Sat, Feb 13, 2016 at 10:45 PM, Nick Papior >> wrote: >> >>> You may be interested in reading: >>> https://www.open-mpi.org/software/ompi/versions/ >>> >>> 2016-02-13 22:30 GMT+01:00 Kim Walisch : >>> Hi, In order to make life of my users easier I have built a fully statically linked version of my primecount program. So the program also statically links against Open MPI. I have built this binary on CentOS-7-x86_64 using gcc. The good news is that the binary runs without any issues on Ubuntu 15.10 x64 (uses mpiexec (OpenRTE) 1.10.2). The bad news is that the binary does not work on Ubuntu 14.04 x64 which uses mpiexec (OpenRTE) 1.6.5. Here is the error message: $ mpirun -n 1 ./primecount 1e10 -t1 [ip-XXX:02671] [[8243,0],0] mca_oob_tcp_recv_handler: invalid message type: 15 -- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -- ubuntu@ip-XXX:~$ mpiexec --version mpiexec (OpenRTE) 1.6.5 Questions: 1) Is this backwards incompatibility issue an Open MPI bug? 2) Can I expect that my binary will work with future mpiexec versions >= 1.10 (which it was built with)? Thanks and best regards, Kim ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2016/02/28522.php >>> >>> >>> >>> -- >>> Kind regards Nick >>> >> >> > > > -- > Kind regards Nick > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/02/28525.php > -- Jeff Hammond jeff.scie...@gmail.com http://jeffhammond.github.io/