Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-05 Thread Douglas Guptill
On Wed, May 05, 2010 at 06:08:57PM -0400, Gus Correa wrote: > If anybody else has Open MPI working with hyperthreading and "sm" > on a Nehalem box, I would appreciate any information about the > Linux distro and kernel version being used. Debian 5 (lenny), Core i7 920, Asus P6T MoBo, 12GB RAM, Op

Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-05 Thread Gus Correa
Hi Ralph Thank you. Yes, I will give the user/researcher that TCP solution for now, because he needs to start running his model with Open MPI. He bought a brand new super-duper machine, with two-way Nehalem, 48GB RAM, etc, and so far he couldn't do any work, which is frustrating. I googled aroun

Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-05 Thread Ralph Castain
I saw similar issues in my former life when we encountered a Linux "glitch" in the way it handled proximity for shared memory - caused lockups under certain conditions. Turned out the problem was fixed in a later kernel version. Afraid I can't remember the versions involved any more, though

Re: [OMPI users] Problem with mpi_comm_spawn_multiple

2010-05-05 Thread Ralph Castain
Ah, missed that - afraid I no speakee fortran any more (thankfully got to remove that module from my brain 20+ years ago). On May 5, 2010, at 1:18 PM, Andrew J Marquis wrote: > Dear Ralph, > > thanks for that. I have done much the same (as I indicated in my original > post). I this case my C-

Re: [OMPI users] Problem with mpi_comm_spawn_multiple

2010-05-05 Thread Andrew J Marquis
Dear Ralph, thanks for that. I have done much the same (as I indicated in my original post). I this case my C-program correctly spawned the slaves and the slaves printed the correctly passed argument lists. On running this and my fortran slave I get: nsize, mytid: iargs 2

Re: [OMPI users] Problem with mpi_comm_spawn_multiple

2010-05-05 Thread Ralph Castain
I think OMPI is okay - here is a C sample program and the associated output: $ mpirun -np 3 ./spawn_multiple Parent [pid 98895] about to spawn! Parent [pid 98896] about to spawn! Parent [pid 98897] about to spawn! Parent done with spawn Parent sending message to children Parent done with spawn Par

[OMPI users] Problem with mpi_comm_spawn_multiple

2010-05-05 Thread Fred Marquis
Hi, I am using mpi_comm_spawn_multiple to spawn multiple commands with argument lists. I am trying to do this in fortran (77) using version openmpi-1.4.1 and the ifort compiler v9.0. The operating system is SuSE Linux 10.1 (x86-64). I have put together a simple controlling example program (te

Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-05 Thread Gus Correa
Hi Jeff, Ralph, list. Sorry for the long email, and the delay to answer. I had to test MPI/reboot the machine several times to address the questions. Hopefully with answers to all your questions inline below. Jeff Squyres wrote: I'd actually be a little surprised if HT was the problem. I run w

Re: [OMPI users] Fortran derived types

2010-05-05 Thread Cole, Derek E
In general, even in your serial fortran code, you're already taking a performance hit using a derived type. Is it really necessary? It might be easier for you to change your fortran code into more memory friendly structures and then the MPI part will be easier. The serial code will have the adde

Re: [OMPI users] How do I run OpenMPI safely on a Nehalemstandalone machine?

2010-05-05 Thread Jeff Squyres
On May 5, 2010, at 9:48 AM, Gus Correa wrote: > Jeff: Should I wait for the 1.0 stable release? > Use the current stable v0.9.3? > John says one can go fearless with the 1.0 candidate release, > but I tend to chicken out at cutting edge stuff. 1.0rc4 is *really* close to the final. Samuel just m

Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-05 Thread Prentice Bisbal
Douglas Guptill wrote: > On Tue, May 04, 2010 at 05:34:40PM -0600, Ralph Castain wrote: >> On May 4, 2010, at 4:51 PM, Gus Correa wrote: >> >>> Hi Ralph >>> >>> Ralph Castain wrote: One possibility is that the sm btl might not like that you have hyperthreading enabled. >>> I remember tha

Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-05 Thread Prentice Bisbal
Gus Correa wrote: > Hi Ralph > > Ralph Castain wrote: >> One possibility is that the sm btl might not like that you have >> hyperthreading enabled. > > I remember that hyperthreading was discussed months ago, > in the previous incarnation of this problem/thread/discussion on > "Nehalem vs. Open M

Re: [OMPI users] Fortran derived types

2010-05-05 Thread Prentice Bisbal
Vedran Coralic wrote: > Hello, > > In my Fortran 90 code I use several custom defined derived types. > Amongst them is a vector of arrays, i.e. v(:)%f(:,:,:). I am wondering > what the proper way of sending this data structure from one processor to > another is. Is the best way to just restructure

Re: [OMPI users] How do I run OpenMPI safely on a Nehalemstandalone machine?

2010-05-05 Thread Gus Correa
Hi John and Jeff John: Thank you very much for pointing this out. I would never knew about it without your tip. I have yet to understand what it does (I'm browsing the documentation now), and how it can help understand, and perhaps monitor and allocate, CPU/core resources. Jeff: Should I wait f

Re: [OMPI users] Fortran derived types

2010-05-05 Thread Jeff Squyres
Yes, you can use derived datatypes in MPI -- but be sure to read the language chapter in the MPI-2.2 spec to be aware of a series of issues with Fortran. We're actively working on "better" Fortran MPI bindings that won't have issues with sending Fortran derived types (the current "medium" size

Re: [OMPI users] How do I run OpenMPI safely on a Nehalemstandalone machine?

2010-05-05 Thread Jeff Squyres
On May 5, 2010, at 1:14 AM, John Hearns wrote: > Regarding hyperthreading, and finding our information about your CPUs > in detail, there is the excellent hwloc project from OpenMPI > > http://www.open-mpi.org/projects/hwloc/ > > I downloaded the 1.0 release candidate, and it compiled and ran fi

[OMPI users] MPI_Recv hang because readv failed at mca_btl_tcp_frag_recv()

2010-05-05 Thread Guanyinzhu
Hi! I'm using OpenMPI 1.3 on 30 nodes connected with Gigabit Ethernet on Redhat Linux x86_64. Our MPI job sometimes hang and show follow error logs: [btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection timed out (110) I run a test like this: wri

Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-05 Thread John Hearns
Regarding hyperthreading, and finding our information about your CPUs in detail, there is the excellent hwloc project from OpenMPI http://www.open-mpi.org/projects/hwloc/ I downloaded the 1.0 release candidate, and it compiled and ran first time on Nehalem systems. Gives a superb and helpful view