Re: [OMPI users] Bind multiple cores to rank - OpenMPI 1.8.1

2014-06-12 Thread Ralph Castain
I've poked and prodded, and the 1.8.2 tarball seems to be handling this situation just fine. I don't have access to a Torque machine, but I did set everything to follow the same code path, added faux coprocessors, etc. - and it ran just fine. Can you try the 1.8.2 tarball and see if it solves t

Re: [OMPI users] Bind multiple cores to rank - OpenMPI 1.8.1

2014-06-12 Thread Bennet Fauber
On Thu, Jun 12, 2014 at 10:56 AM, Ralph Castain wrote: > I've poked and prodded, and the 1.8.2 tarball seems to be handling this > situation Ralph, That's still the development tarball, right? 1.8.2 remains unreleased? Is the an ETA for 1.8.2 the end of this month? Thanks, -- bennet

Re: [OMPI users] Bind multiple cores to rank - OpenMPI 1.8.1

2014-06-12 Thread Ralph Castain
It isn't a development tarball - it's the current state of the release branch and is therefore managed much more strictly than the developer trunk. We are preparing it now for release candidate. I have about a dozen CMR's waiting for final review before moving across to 1.8.2, and then we'll beg

Re: [OMPI users] Bind multiple cores to rank - OpenMPI 1.8.1

2014-06-12 Thread Dan Dietz
Unfortunately, the nightly tarball appears to be crashing in a similar fashion. :-( I used the latest snapshot 1.8.2a1r31981. Dan On Thu, Jun 12, 2014 at 10:56 AM, Ralph Castain wrote: > I've poked and prodded, and the 1.8.2 tarball seems to be handling this > situation just fine. I don't have

Re: [OMPI users] Bind multiple cores to rank - OpenMPI 1.8.1

2014-06-12 Thread Ralph Castain
Arggh - is there any way I can get access to this beast so I can debug this? I can't figure out what in the world is going on, but it seems to be something triggered by your specific setup. On Jun 12, 2014, at 8:48 AM, Dan Dietz wrote: > Unfortunately, the nightly tarball appears to be crashi

Re: [OMPI users] Bind multiple cores to rank - OpenMPI 1.8.1

2014-06-12 Thread Dan Dietz
That shouldn't be a problem. Let me figure out the process and I'll get back to you. Dan On Thu, Jun 12, 2014 at 11:50 AM, Ralph Castain wrote: > Arggh - is there any way I can get access to this beast so I can debug this? > I can't figure out what in the world is going on, but it seems to be

Re: [OMPI users] Bind multiple cores to rank - OpenMPI 1.8.1

2014-06-12 Thread Ralph Castain
Kewl - thanks! I'm a Purdue alum, if that helps :-) On Jun 12, 2014, at 9:04 AM, Dan Dietz wrote: > That shouldn't be a problem. Let me figure out the process and I'll > get back to you. > > Dan > > On Thu, Jun 12, 2014 at 11:50 AM, Ralph Castain wrote: >> Arggh - is there any way I can get a

Re: [OMPI users] OPENIB unknown transport errors

2014-06-12 Thread Tim Miller
Aha ... looking at "ibv_devinfo -v" got me my first concrete hint of what's going on. On a node that's working fine (w2), under port 1 there is a line: LinkLayer: InfiniBand On a node that is having trouble (w3), that line is not present. The question is why this inconsistency occurs. I don't se