I’ve got an updated patch that adds the desired “skip version check” in the queue - should be committed in the next hour or so. Will be in the next nightly 1.8.5 tarball
> On Apr 10, 2015, at 7:26 AM, Ralph Castain <r...@open-mpi.org> wrote: > > I realize there are still longer term issues, but we haven’t resolved how we > want to handle those yet. The ABI promise is solely at the MPI interface, not > the internal ones. Defining version compatibility at that level is the > problem. > > I can add a suppression param, though it may not totally resolve the problem > (e.g., the security protocol might change). Still, it may take things a > little further. > > >> On Apr 10, 2015, at 6:57 AM, Alan Wild <a...@madllama.net> wrote: >> >> Sorry I didn't get back to your right away. 1) I'm on the digest, 2) not >> real familiar with git and 3) just learned the hard way how to update the >> build to work with the latest versions of automake, autoconf, and libtool. :) >> >> Anyway, I believe the patch is an improvement. Looking at it, I can tell >> you are now checking the first three characters. I know the plan is to go >> to 1.9 and then 2.0, but if the numbering ever went more like the linux >> kernel into, say, a 2.10.0 release then your number of characters would be >> off. Also, doesn't the current ABI promise allow 1.7 to be compatible with >> 1.8? >> >> Personally, I'm fine with the solution, but I wanted to point out the >> potential shortcoming(s) should an issue arise again. >> >> One other thought, maybe this is an case where the code should emit a >> warning (that could be suppressed with a command line parameter) when the >> versions aren't identical? Certainly if the versions are outside the >> "allowed" range (whatever you determine that to be) should be an error and a >> refused connection, but rather than silently accepting mixed versions (which >> you indicated has caused problems in the past would be to warn of a >> potential issue (and users could then consciously suppress the warning if >> they are fine with it). Food for thought. >> >> Unfortunately, the patch didn't actually solve my particular problem (yet, >> anyway) because the vendor application statically linked 1.8.3 into their >> executable. (I honestly didn't realize it when I made my previous post). >> So the code on their side of the connection is still rejecting the >> connection: >> >> [arwild1@hplcslsp2 ~]$ mpirun -n 6 -H localhost vendor_mpi_app >> [hplcslsp2:23064] [[44148,1],0] tcp_peer_recv_connect_ack: received >> different version from [[44148,0],0]: 1.8.5rc2 instead of 1.8.3 >> [hplcslsp2:23065] [[44148,1],1] tcp_peer_recv_connect_ack: received >> different version from [[44148,0],0]: 1.8.5rc2 instead of 1.8.3 >> [hplcslsp2:23067] [[44148,1],2] tcp_peer_recv_connect_ack: received >> different version from [[44148,0],0]: 1.8.5rc2 instead of 1.8.3 >> [hplcslsp2:23069] [[44148,1],3] tcp_peer_recv_connect_ack: received >> different version from [[44148,0],0]: 1.8.5rc2 instead of 1.8.3 >> [hplcslsp2:23071] [[44148,1],4] tcp_peer_recv_connect_ack: received >> different version from [[44148,0],0]: 1.8.5rc2 instead of 1.8.3 >> ------------------------------------------------------- >> Primary job terminated normally, but 1 process returned >> a non-zero exit code.. Per user-direction, the job has been aborted. >> ------------------------------------------------------- >> >> >> However, I believe if I can get the vendor to adopt this patch (or at least >> dynamically link) the patch should help alleviate the need to stay in >> lock-step version for version. >> >> Thank you, >> >> -Alan >> >