I’ve got an updated patch that adds the desired “skip version check” in the 
queue - should be committed in the next hour or so. Will be in the next nightly 
1.8.5 tarball


> On Apr 10, 2015, at 7:26 AM, Ralph Castain <r...@open-mpi.org> wrote:
> 
> I realize there are still longer term issues, but we haven’t resolved how we 
> want to handle those yet. The ABI promise is solely at the MPI interface, not 
> the internal ones. Defining version compatibility at that level is the 
> problem.
> 
> I can add a suppression param, though it may not totally resolve the problem 
> (e.g., the security protocol might change). Still, it may take things a 
> little further.
> 
> 
>> On Apr 10, 2015, at 6:57 AM, Alan Wild <a...@madllama.net> wrote:
>> 
>> Sorry I didn't get back to your right away.  1) I'm on the digest, 2) not 
>> real familiar with git and 3) just learned the hard way how to update the 
>> build to work with the latest versions of automake, autoconf, and libtool. :)
>> 
>> Anyway, I believe the patch is an improvement.  Looking at it, I can tell 
>> you are now checking the first three characters.  I know the plan is to go 
>> to 1.9 and then 2.0, but if the numbering ever went more like the linux 
>> kernel into, say, a 2.10.0 release then your number of characters would be 
>> off.  Also, doesn't the current ABI promise allow 1.7 to be compatible with 
>> 1.8? 
>> 
>> Personally, I'm fine with the solution, but I wanted to point out the 
>> potential shortcoming(s) should an issue arise again.  
>> 
>> One other thought, maybe this is an case where the code should emit a 
>> warning (that could be suppressed with a command line parameter) when the 
>> versions aren't identical?   Certainly if the versions are outside the 
>> "allowed" range (whatever you determine that to be) should be an error and a 
>> refused connection, but rather than silently accepting mixed versions (which 
>> you indicated has caused problems in the past would be to warn of a 
>> potential issue (and users could then consciously suppress the warning if 
>> they are fine with it).  Food for thought.
>> 
>> Unfortunately, the patch didn't actually solve my particular problem (yet, 
>> anyway) because the vendor application statically linked 1.8.3 into their 
>> executable.  (I honestly didn't realize it when I made my previous post).  
>> So the code on their side of the connection is still rejecting the 
>> connection:
>> 
>> [arwild1@hplcslsp2 ~]$ mpirun -n 6 -H localhost vendor_mpi_app
>> [hplcslsp2:23064] [[44148,1],0] tcp_peer_recv_connect_ack: received 
>> different version from [[44148,0],0]: 1.8.5rc2 instead of 1.8.3
>> [hplcslsp2:23065] [[44148,1],1] tcp_peer_recv_connect_ack: received 
>> different version from [[44148,0],0]: 1.8.5rc2 instead of 1.8.3
>> [hplcslsp2:23067] [[44148,1],2] tcp_peer_recv_connect_ack: received 
>> different version from [[44148,0],0]: 1.8.5rc2 instead of 1.8.3
>> [hplcslsp2:23069] [[44148,1],3] tcp_peer_recv_connect_ack: received 
>> different version from [[44148,0],0]: 1.8.5rc2 instead of 1.8.3
>> [hplcslsp2:23071] [[44148,1],4] tcp_peer_recv_connect_ack: received 
>> different version from [[44148,0],0]: 1.8.5rc2 instead of 1.8.3
>> -------------------------------------------------------
>> Primary job  terminated normally, but 1 process returned
>> a non-zero exit code.. Per user-direction, the job has been aborted.
>> -------------------------------------------------------
>> 
>> 
>> However, I believe if I can get the vendor to adopt this patch (or at least 
>> dynamically link) the patch should help alleviate the need to stay in 
>> lock-step version for version.
>> 
>> Thank you,
>> 
>> -Alan
>> 
> 

Reply via email to