[OMPI users] Role of ethernet interfaces of startup of openmpi job using IB

2011-09-27 Thread Salvatore Podda
Dear all, We would like to know if the ethernet interfaces play any role in the startup phase of an opempi job using InfiniBand In this case, where we can found some literature on this topic? This interest arises from some observations of a substantial time overhead on the startup of our

Re: [OMPI users] Segfault on any MPI communication on head node

2011-09-27 Thread Jeff Squyres
Hmm. It's not immediately clear to me what's going wrong here. I hate to ask, but could you install a debugging version of Open MPI and capture a proper stack trace of the segv? Also, could you try the 1.4.4 rc and see if that magically fixes the problem? (I'm about to post a new 1.4.4 rc late

Re: [OMPI users] Role of ethernet interfaces of startup of openmpi job using IB

2011-09-27 Thread Jeff Squyres
On Sep 27, 2011, at 6:35 AM, Salvatore Podda wrote: > We would like to know if the ethernet interfaces play any role in the > startup phase of an opempi job using InfiniBand > In this case, where we can found some literature on this topic? Unfortunately, there's not a lot of docs about thi

Re: [OMPI users] Fault Tolerant with openib

2011-09-27 Thread Guilherme V
Do you know if is there another patch available so my application treats the fail of one node instead of mpi kill the job? This is very important for me, I have a big cluster and I can't stop my job every time I have some problem with just one node. Regards On Fri, Sep 23, 2011 at 4:34 PM, Ralph

Re: [OMPI users] Segfault on any MPI communication on head node

2011-09-27 Thread Henderson, Brent
Here is another possibly non-helpful suggestion. :) Change: char* name[20]; int maxlen = 20; To: char name[256]; int maxlen = 256; gethostname() is supposed to properly truncate the hostname it returns if the actual name is longer than the length provided, but since you h

[OMPI users] alternate PBS_NODEFILE

2011-09-27 Thread Wiegers, Bert
Hi, we have a clustersetup with all nodes slot=1 (although 12 cores are present). Now we would like to alternate the machinefile for a specific User. I found this hint: http://www.open-mpi.org/faq/?category=tm Is this still valid? We have openMPI v 1.4.3 running. Trying to generate an own mac

Re: [OMPI users] Segfault on any MPI communication on head node

2011-09-27 Thread Phillip Vassenkov
Thanks, but my main concern is the segfault :P I changed and as I expected it still segfaults. On 9/27/11 9:48 AM, Henderson, Brent wrote: Here is another possibly non-helpful suggestion. :) Change: char* name[20]; int maxlen = 20; To: char name[256]; int maxlen = 2

Re: [OMPI users] EXTERNAL: Re: Trouble compiling 1.4.3 with PGI 10.9 compilers

2011-09-27 Thread Blosch, Edwin L
Yes, I've been copying around the source tree. That was the problem. If I am careful to preserve the original timestamps, there are no problems. Thanks -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres Sent: Monday, Septem

Re: [OMPI users] Segfault on any MPI communication on head node

2011-09-27 Thread Gus Correa
Any chance that the stacksize in the head node is too small, compared to the compute nodes? Small stacksize can cause segfaults. Check /etc/security/limits.conf (and man limits.conf). You could set it to unlimited (say, along with locked memory and perhaps number of open files): * - stack

Re: [OMPI users] EXTERNAL: Re: Trouble compiling 1.4.3 with PGI 10.9 compilers

2011-09-27 Thread Reuti
Am 27.09.2011 um 01:16 schrieb Jeff Squyres: On Sep 26, 2011, at 6:53 PM, Blosch, Edwin L wrote: Actually I can download OpenMPI 1.5.4, 1.4.4rc3 or 1.4.3 - and ALL of them build just fine. Apparently what isn't working is the version of 1.4.3 that I have downloaded and copied from place t

Re: [OMPI users] Segfault on any MPI communication on head node

2011-09-27 Thread German Hoecht
char* name[20]; yields 20 (undefined) pointers to char, guess you mean char name[20]; So Brent's suggestion should work as well(?) To be safe I would also add: gethostname(name,maxlen); name[19] = '\0'; printf("Hello, world. I am %d of %d and host %s \n", rank, ... Cheers On 09/27/2011 07:40 P

Re: [OMPI users] alternate PBS_NODEFILE

2011-09-27 Thread Reuti
Hi, Am 27.09.2011 um 16:00 schrieb Wiegers, Bert: we have a clustersetup with all nodes slot=1 (although 12 cores are present). Now we would like to alternate the machinefile for a specific User. I found this hint: http://www.open-mpi.org/faq/?category=tm Is this still valid? We have openM

Re: [OMPI users] maximum size for read buffer in MPI_File_read/write

2011-09-27 Thread Rob Latham
On Thu, Sep 22, 2011 at 11:37:10PM +0200, German Hoecht wrote: > Hello, > > MPI_File_read/write functions uses an integer to specify the size of > the buffer, for instance: > int MPI_File_read(MPI_File fh, void *buf, int count, MPI_Datatype > datatype, MPI_Status *status) > with: > count Numb

Re: [OMPI users] Role of ethernet interfaces of startup of openmpi job using IB

2011-09-27 Thread Prentice Bisbal
On 09/27/2011 07:50 AM, Jeff Squyres wrote: > On Sep 27, 2011, at 6:35 AM, Salvatore Podda wrote: > >> We would like to know if the ethernet interfaces play any role in the >> startup phase of an opempi job using InfiniBand >> In this case, where we can found some literature on this topic?

Re: [OMPI users] ompi-checkpoint problem on shared storage

2011-09-27 Thread Dave Schulz
Thanks Josh, Just yesterday I stumbled upon another interesting detail about this issue. While reconfiguring things, I accidentally ran as root, and the checkpointing all succeeded. I'm not sure though how to go about finding what file things are hanging up on. I've compared straces as roo

Re: [OMPI users] Role of ethernet interfaces of startup of openmpi job using IB

2011-09-27 Thread Jeff Squyres
On Sep 27, 2011, at 5:03 PM, Prentice Bisbal wrote: > To clarify, is IP/Ethernet required, or will IPoIB be used if it's > configured on the nodes? Would this make a difference. IPoIB is fine, although I've heard concerns about its stability at scale. The difference that it'll make is that it's