Had time to think about this a bit, and I believe you are absolutely correct about the ABI - I think we accidentally broke that guarantee in the 1.8 series with this version check. It shouldn’t have been that strict. The revised algo is the correct one.
Sorry for the error - just completely slipped by us. I’ll ensure it is corrected going forward. > On Apr 10, 2015, at 7:59 AM, Ralph Castain <r...@open-mpi.org> wrote: > > I’ve got an updated patch that adds the desired “skip version check” in the > queue - should be committed in the next hour or so. Will be in the next > nightly 1.8.5 tarball > > >> On Apr 10, 2015, at 7:26 AM, Ralph Castain <r...@open-mpi.org> wrote: >> >> I realize there are still longer term issues, but we haven’t resolved how we >> want to handle those yet. The ABI promise is solely at the MPI interface, >> not the internal ones. Defining version compatibility at that level is the >> problem. >> >> I can add a suppression param, though it may not totally resolve the problem >> (e.g., the security protocol might change). Still, it may take things a >> little further. >> >> >>> On Apr 10, 2015, at 6:57 AM, Alan Wild <a...@madllama.net> wrote: >>> >>> Sorry I didn't get back to your right away. 1) I'm on the digest, 2) not >>> real familiar with git and 3) just learned the hard way how to update the >>> build to work with the latest versions of automake, autoconf, and libtool. >>> :) >>> >>> Anyway, I believe the patch is an improvement. Looking at it, I can tell >>> you are now checking the first three characters. I know the plan is to go >>> to 1.9 and then 2.0, but if the numbering ever went more like the linux >>> kernel into, say, a 2.10.0 release then your number of characters would be >>> off. Also, doesn't the current ABI promise allow 1.7 to be compatible with >>> 1.8? >>> >>> Personally, I'm fine with the solution, but I wanted to point out the >>> potential shortcoming(s) should an issue arise again. >>> >>> One other thought, maybe this is an case where the code should emit a >>> warning (that could be suppressed with a command line parameter) when the >>> versions aren't identical? Certainly if the versions are outside the >>> "allowed" range (whatever you determine that to be) should be an error and >>> a refused connection, but rather than silently accepting mixed versions >>> (which you indicated has caused problems in the past would be to warn of a >>> potential issue (and users could then consciously suppress the warning if >>> they are fine with it). Food for thought. >>> >>> Unfortunately, the patch didn't actually solve my particular problem (yet, >>> anyway) because the vendor application statically linked 1.8.3 into their >>> executable. (I honestly didn't realize it when I made my previous post). >>> So the code on their side of the connection is still rejecting the >>> connection: >>> >>> [arwild1@hplcslsp2 ~]$ mpirun -n 6 -H localhost vendor_mpi_app >>> [hplcslsp2:23064] [[44148,1],0] tcp_peer_recv_connect_ack: received >>> different version from [[44148,0],0]: 1.8.5rc2 instead of 1.8.3 >>> [hplcslsp2:23065] [[44148,1],1] tcp_peer_recv_connect_ack: received >>> different version from [[44148,0],0]: 1.8.5rc2 instead of 1.8.3 >>> [hplcslsp2:23067] [[44148,1],2] tcp_peer_recv_connect_ack: received >>> different version from [[44148,0],0]: 1.8.5rc2 instead of 1.8.3 >>> [hplcslsp2:23069] [[44148,1],3] tcp_peer_recv_connect_ack: received >>> different version from [[44148,0],0]: 1.8.5rc2 instead of 1.8.3 >>> [hplcslsp2:23071] [[44148,1],4] tcp_peer_recv_connect_ack: received >>> different version from [[44148,0],0]: 1.8.5rc2 instead of 1.8.3 >>> ------------------------------------------------------- >>> Primary job terminated normally, but 1 process returned >>> a non-zero exit code.. Per user-direction, the job has been aborted. >>> ------------------------------------------------------- >>> >>> >>> However, I believe if I can get the vendor to adopt this patch (or at least >>> dynamically link) the patch should help alleviate the need to stay in >>> lock-step version for version. >>> >>> Thank you, >>> >>> -Alan >>> >> >