I've poked and prodded, and the 1.8.2 tarball seems to be handling this
situation just fine. I don't have access to a Torque machine, but I did set
everything to follow the same code path, added faux coprocessors, etc. - and it
ran just fine.
Can you try the 1.8.2 tarball and see if it solves t
On Thu, Jun 12, 2014 at 10:56 AM, Ralph Castain wrote:
> I've poked and prodded, and the 1.8.2 tarball seems to be handling this
> situation
Ralph,
That's still the development tarball, right? 1.8.2 remains unreleased?
Is the an ETA for 1.8.2 the end of this month?
Thanks, -- bennet
It isn't a development tarball - it's the current state of the release branch
and is therefore managed much more strictly than the developer trunk. We are
preparing it now for release candidate. I have about a dozen CMR's waiting for
final review before moving across to 1.8.2, and then we'll beg
Unfortunately, the nightly tarball appears to be crashing in a similar
fashion. :-( I used the latest snapshot 1.8.2a1r31981.
Dan
On Thu, Jun 12, 2014 at 10:56 AM, Ralph Castain wrote:
> I've poked and prodded, and the 1.8.2 tarball seems to be handling this
> situation just fine. I don't have
Arggh - is there any way I can get access to this beast so I can debug this? I
can't figure out what in the world is going on, but it seems to be something
triggered by your specific setup.
On Jun 12, 2014, at 8:48 AM, Dan Dietz wrote:
> Unfortunately, the nightly tarball appears to be crashi
That shouldn't be a problem. Let me figure out the process and I'll
get back to you.
Dan
On Thu, Jun 12, 2014 at 11:50 AM, Ralph Castain wrote:
> Arggh - is there any way I can get access to this beast so I can debug this?
> I can't figure out what in the world is going on, but it seems to be
Kewl - thanks! I'm a Purdue alum, if that helps :-)
On Jun 12, 2014, at 9:04 AM, Dan Dietz wrote:
> That shouldn't be a problem. Let me figure out the process and I'll
> get back to you.
>
> Dan
>
> On Thu, Jun 12, 2014 at 11:50 AM, Ralph Castain wrote:
>> Arggh - is there any way I can get a
Aha ... looking at "ibv_devinfo -v" got me my first concrete hint of what's
going on. On a node that's working fine (w2), under port 1 there is a line:
LinkLayer: InfiniBand
On a node that is having trouble (w3), that line is not present. The
question is why this inconsistency occurs.
I don't se