date:20120907

Re: [OMPI users] Infiniband performance Problem and stalling

2012-09-07 Thread Randolph Pullen

Yevgeny, The ibstat results: CA 'mthca0' CA type: MT25208 (MT23108 compat mode) Number of ports: 2 Firmware version: 4.7.600 Hardware version: a0 Node GUID: 0x0005ad0c21e0 System image GUID: 0x0005ad000100d050 Port 1: State

Re: [OMPI users] Infiniband performance Problem and stalling

2012-09-07 Thread Randolph Pullen

One system is actually an i5-2400 - maybe its throttling back on 2 cores to save power? The other(I7) shows consistent CPU MHz on all cores From: Yevgeny Kliteynik To: Randolph Pullen ; OpenMPI Users Sent: Thursday, 6 September 2012 6:03 PM Subject: Re: [OMP

Re: [OMPI users] some mpi processes "disappear" on a cluster of servers

2012-09-07 Thread Andrea Negri

George, I hace done some modifications to the code, however this is the first part my zmp_list: !ZEUSMP2 CONFIGURATION FILE &GEOMCONF LGEOM= 2, LDIMEN = 2 / &PHYSCONF LRAD = 0, XHYDRO = .TRUE., XFORCE = .TRUE., XMHD

Re: [OMPI users] some mpi processes "disappear" on a cluster of servers

2012-09-07 Thread Jeff Squyres

On Sep 5, 2012, at 3:59 AM, Andrea Negri wrote: > I have tried with these flags (I use gcc 4.7 and open mpi 1.6), but > the program doesn't crash, a node go down and the rest of them remain > to wait a signal (there is an ALLREDUCE in the code). > > Anyway, yesterday some processes died (without

Re: [OMPI users] some mpi processes "disappear" on a cluster of servers

2012-09-07 Thread Jeff Squyres

On Sep 7, 2012, at 5:58 AM, Jeff Squyres wrote: > Also look for hardware errors. Perhaps you have some bad RAM somewhere. Is > it always the same node that crashes? And so on. Another thought on hardware errors... I actually have seen bad RAM cause spontaneous reboots with no Linux warnings

Re: [OMPI users] problem with rankfile

2012-09-07 Thread Siegmar Gross

Hi, are the following outputs helpful to find the error with a rankfile on Solaris? I wrapped long lines so that they are easier to read. Have you had time to look at the segmentation fault with a rankfile which I reported in my last email (see below)? "tyr" is a two processor single core machine

Re: [OMPI users] problem with rankfile

2012-09-07 Thread Ralph Castain

On Sep 7, 2012, at 5:41 AM, Siegmar Gross wrote: > Hi, > > are the following outputs helpful to find the error with > a rankfile on Solaris? If you can't bind on the new Solaris machine, then the rankfile won't do you any good. It looks like we are getting the incorrect number of cores on th

Re: [OMPI users] some mpi processes "disappear" on a cluster of servers

2012-09-07 Thread Gus Correa

On 09/03/2012 04:39 PM, Andrea Negri wrote: max locked memory (kbytes, -l) 32 max memory size(kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 s

Re: [OMPI users] some mpi processes "disappear" on a cluster of servers

2012-09-07 Thread Gus Correa

On 09/07/2012 08:02 AM, Jeff Squyres wrote: On Sep 7, 2012, at 5:58 AM, Jeff Squyres wrote: Also look for hardware errors. Perhaps you have some bad RAM somewhere. Is it always the same node that crashes? And so on. Another thought on hardware errors... I actually have seen bad RAM cause

Re: [OMPI users] Infiniband performance Problem and stalling

Re: [OMPI users] Infiniband performance Problem and stalling

Re: [OMPI users] some mpi processes "disappear" on a cluster of servers

Re: [OMPI users] some mpi processes "disappear" on a cluster of servers

Re: [OMPI users] some mpi processes "disappear" on a cluster of servers

Re: [OMPI users] problem with rankfile

Re: [OMPI users] problem with rankfile

Re: [OMPI users] some mpi processes "disappear" on a cluster of servers

Re: [OMPI users] some mpi processes "disappear" on a cluster of servers

9 matches

Site Navigation

Mail list logo

Footer information