Michael,

I can only guess, but perhaps it is due to some nodes with differing software 
installations, or not all nodes using context sharing?

You might try contacting Redhat support to see if they know more since the RPMs 
are sourced from them, or even try Intel support via ibsupp...@intel.com.

Andrew

> -----Original Message-----
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Michael Di
> Domenico
> Sent: Thursday, November 6, 2014 4:38 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] ipath_userinit errors
> 
> Andrew,
> 
> Thanks.  We're using the RHEL version because it was less complicated for our
> environment in the past, but sounds like we might want to reconsider that
> decision.
> 
> Do you know why we don't see the message with lower node count
> allocations?  It only seems to happen when the node count gets over a
> certain point?
> 
> thanks
> 
> On Wed, Nov 5, 2014 at 5:51 PM, Friedley, Andrew
> <andrew.fried...@intel.com> wrote:
> > Hi Michael,
> >
> > From what I understand, this is an issue with the qib driver and PSM from
> RHEL 6.5 and 6.6, and will be fixed for 6.7.  There is no functional change
> between qib->PSM API versions 11 and 12, so the message is harmless.  I
> presume you're using the RHEL sourced package for a reason, but using an
> IFS release would fix the problem until RHEL 6.7 is ready.
> >
> > Andrew
> >
> >> -----Original Message-----
> >> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Michael
> >> Di Domenico
> >> Sent: Tuesday, November 4, 2014 8:35 AM
> >> To: Open MPI Users
> >> Subject: [OMPI users] ipath_userinit errors
> >>
> >> I'm getting the below message on my cluster(s).  It seems to only
> >> happen when I try to use more then 64 nodes (16-cores each).  The
> >> clusters are running RHEL 6.5 with Slurm and Openmpi-1.6.5 with PSM.
> >> I'm using the OFED versions included with RHEL for infiniband support.
> >>
> >> ipath_userinit: Mismatched user minor version (12) and driver minor
> >> version
> >> (11) while context sharing. Ensure that driver and library are from
> >> the same release
> >>
> >> I already realize this is a warning message and the jobs complete.
> >> Another user a little over a year ago had a similar issue that was
> >> tracked to mismatched ofed versions.  Since i have a diskless cluster
> >> all my nodes are identical.
> >>
> >> I'm not adverse to thinking there might not be something unique about
> >> my machine, but since i have two separate machines doing it, I'm not
> >> really sure where to look to triage the issue and see what might be set
> incorrectly.
> >>
> >> Any thoughts on where to start checking would be helpful, thanks...
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> Link to this post: http://www.open-
> >> mpi.org/community/lists/users/2014/11/25667.php
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> > http://www.open-mpi.org/community/lists/users/2014/11/25694.php
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: http://www.open-
> mpi.org/community/lists/users/2014/11/25698.php

Reply via email to