Yevgeny,
The ibstat results:
CA 'mthca0'
CA type: MT25208 (MT23108 compat mode)
Number of ports: 2
Firmware version: 4.7.600
Hardware version: a0
Node GUID: 0x0005ad0c21e0
System image GUID: 0x0005ad000100d050
Port 1:
State
One system is actually an i5-2400 - maybe its throttling back on 2 cores to
save power?
The other(I7) shows consistent CPU MHz on all cores
From: Yevgeny Kliteynik
To: Randolph Pullen ; OpenMPI Users
Sent: Thursday, 6 September 2012 6:03 PM
Subject: Re: [OMP
George,
I hace done some modifications to the code, however this is the first
part my zmp_list:
!ZEUSMP2 CONFIGURATION FILE
&GEOMCONF LGEOM= 2,
LDIMEN = 2 /
&PHYSCONF LRAD = 0,
XHYDRO = .TRUE.,
XFORCE = .TRUE.,
XMHD
On Sep 5, 2012, at 3:59 AM, Andrea Negri wrote:
> I have tried with these flags (I use gcc 4.7 and open mpi 1.6), but
> the program doesn't crash, a node go down and the rest of them remain
> to wait a signal (there is an ALLREDUCE in the code).
>
> Anyway, yesterday some processes died (without
On Sep 7, 2012, at 5:58 AM, Jeff Squyres wrote:
> Also look for hardware errors. Perhaps you have some bad RAM somewhere. Is
> it always the same node that crashes? And so on.
Another thought on hardware errors... I actually have seen bad RAM cause
spontaneous reboots with no Linux warnings
Hi,
are the following outputs helpful to find the error with
a rankfile on Solaris? I wrapped long lines so that they
are easier to read. Have you had time to look at the
segmentation fault with a rankfile which I reported in my
last email (see below)?
"tyr" is a two processor single core machine
On Sep 7, 2012, at 5:41 AM, Siegmar Gross
wrote:
> Hi,
>
> are the following outputs helpful to find the error with
> a rankfile on Solaris?
If you can't bind on the new Solaris machine, then the rankfile won't do you
any good. It looks like we are getting the incorrect number of cores on th
On 09/03/2012 04:39 PM, Andrea Negri wrote:
max locked memory (kbytes, -l) 32
max memory size(kbytes, -m) unlimited
open files (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
s
On 09/07/2012 08:02 AM, Jeff Squyres wrote:
On Sep 7, 2012, at 5:58 AM, Jeff Squyres wrote:
Also look for hardware errors. Perhaps you have some bad RAM somewhere. Is it
always the same node that crashes? And so on.
Another thought on hardware errors... I actually have seen bad RAM cause