Re: [OMPI users] very bad parallel scaling of vasp using openmpi

2009-08-17 Thread Jeff Squyres
You might want to run some performance testing of you TCP stacks and the switch -- use a non-MPI application such as NetPIPE (or others -- google around) and see what kind of throughput you get. Try it between individual server peers and then try running it simultaneously between a bunch o

[OMPI users] very bad parallel scaling of vasp using openmpi

2009-08-17 Thread Craig Plaisance
Hi - I have compiled vasp 4.6.34 using the Intel fortran compiler 11.1 with openmpi 1.3.3 on a cluster of 104 nodes running Rocks 5.2 with two quad core opterons connected by a Gbit ethernet. Running in parallel on one node (8 cores) runs very well, faster than any other cluster I have run it

Re: [OMPI users] How to make a job abort when one host dies?

2009-08-17 Thread Scott Atchley
On Aug 17, 2009, at 2:43 PM, Jeff Squyres wrote: George / Myricom -- Does the MX MTL abort if it gets a "disconnected" error back from libmyriexpress? Short answer: yes. Long answer: The messages below indicate that these processes were all trying to send to cl120. It did not ack their

Re: [OMPI users] How to make a job abort when one host dies?

2009-08-17 Thread Jeff Squyres
George / Myricom -- Does the MX MTL abort if it gets a "disconnected" error back from libmyriexpress? On Aug 11, 2009, at 7:07 AM, Oskar Enoksson wrote: I searched the FAQ and google but couldn't come up with a solution to this problem. My problem is that when one MPI execution host dies

Re: [OMPI users] problem starting openmpi on core duo macosx5

2009-08-17 Thread Jeff Squyres
FWIW, if you use the right mpicc/mpif77/mpif90, you shouldn't need to specify any of the -L or -l options. Those will automatically be specified by the wrapper compiler. If you cannot use mpicc/mpif77/mpif90, then see this FAQ entry: http://www.open-mpi.org/faq/?category=mpi-apps#cant-u

Re: [OMPI users] Invalid Info object in MPI_Comm_spawn_multiple

2009-08-17 Thread Jeff Squyres
Are you initializing your MPI_Info object? Remember that -- at a minimum -- you need to call MPI_INFO_CREATE on an MPI_Info object (or pass MPI_INFO_NULL). On Aug 17, 2009, at 11:28 AM, Federico Golfrè Andreasi wrote: Hi! I have a little code that uses the MPI_Comm_spawn_multiple, I've us

Re: [OMPI users] PBS tm error returns

2009-08-17 Thread Ralph Castain
Hi David You are quite correct. IIRC, we didn't bother checking the local_err because we found it to be unreliable - all Torque checks is that the program exec's. It doesn't report back an error if it segfaults instantly, for example, or aborts because it fails to find a required library. So we ad

Re: [OMPI users] Help: How to accomplish processors affinity

2009-08-17 Thread Eugene Loh
Lee Amy wrote: I build a Kerrighed Clusters Like Lenny, I'm not familiar with such clusters, but... with 4 nodes so they look like a big SMP machine. every node has 1 processor with dingle core. 1) Dose MPI programs could be running on such kinds of machine? If yes, could anyone show me som

Re: [OMPI users] Invalid Info object in MPI_Comm_spawn_multiple

2009-08-17 Thread Ralph Castain
We tried to make the most common info_keys the same, but there can be differences. What info keys are you trying to pass? 2009/8/17 Federico Golfrè Andreasi > Hi! > > I have a little code that uses the MPI_Comm_spawn_multiple, > I've used it without any problems with the MPICH2 and MVAPICH2 > i

[OMPI users] Invalid Info object in MPI_Comm_spawn_multiple

2009-08-17 Thread Federico Golfrè Andreasi
Hi! I have a little code that uses the MPI_Comm_spawn_multiple, I've used it without any problems with the MPICH2 and MVAPICH2 implementation of MPI-2, but with the Open MPI v1.3.3 it throws this error: *** An error occurred in MPI_Comm_spawn_multiple *** on communicator MPI_COMM_WORLD *** MPI_ER

Re: [OMPI users] rank file error: Rankfile claimed...

2009-08-17 Thread Eugene Loh
jody wrote: But can you explain what the meaning of the max-slots entry is? I checked the FAQs http://www.open-mpi.org/faq/?category=running#simple-spmd-run http://www.open-mpi.org/faq/?category=running#mpirun-scheduling but i couldn't find any explanation. (furthermore, in the FAQ it says "ma

Re: [OMPI users] rank file error: Rankfile claimed...

2009-08-17 Thread jody
Hi Lenny After removing the max-slots entries, i could do mpirun -np 4 -hostfile th_02 -rf rf_02 ./HelloMPI without any errors. But can you explain what the meaning of the max-slots entry is? I checked the FAQs http://www.open-mpi.org/faq/?category=running#simple-spmd-run http://www.open-mp

Re: [OMPI users] rank file error: Rankfile claimed...

2009-08-17 Thread Lenny Verkhovsky
can you try not specifiyng "max-slots" in the hostfile. if you are the only user of the nodes, there will be no oversibscibing of the processors. This one definetly looks like a bug, but as Ralph said there is a current disscusion and working on this component. Lenny. On Mon, Aug 17, 2009 at 2:37

Re: [OMPI users] Totalview and OpenMPI problem solved

2009-08-17 Thread Jeff Squyres
Added to the FAQ -- thanks! On Aug 12, 2009, at 11:55 AM, Gabriele Fatigati wrote: Dear OpenMPI developers, referred to the follow problem: http://openmpi.igor.onlinedirect.bg/faq/?category=troubleshooting#parallel-debugger-attach me and Cristiano Calonaci have compiled openmpi 1.3.3 with in

Re: [OMPI users] rank file error: Rankfile claimed...

2009-08-17 Thread Ralph Castain
Is there an explanation for this? I believe the word is "bug". :-) The rank_file mapper has been substantially revised lately - we are discussing now how much of that revision to bring to 1.3.4 versus the next major release. Ralph On Aug 17, 2009, at 4:45 AM, jody wrote: Hi Lenny I th

Re: [OMPI users] rank file error: Rankfile claimed...

2009-08-17 Thread jody
Hi Lenny > I think it has something to do with your environment,  /etc/hosts, IT setup, > hostname function return value e.t.c > I am not sure if it has something to do with Open MPI at all. OK. I just thought this was Open MPI related because i was able to use the aliases of the hosts (i.e. pla

Re: [OMPI users] rank file error: Rankfile claimed...

2009-08-17 Thread Lenny Verkhovsky
I think it has something to do with your environment, /etc/hosts, IT setup, hostname function return value e.t.c I am not sure if it has something to do with Open MPI at all. Lenny. On Mon, Aug 17, 2009 at 12:59 PM, jody wrote: > Hi Lenny > > Thanks - using the full names makes it work! > Is the

Re: [OMPI users] rank file error: Rankfile claimed...

2009-08-17 Thread jody
Hi Lenny Thanks - using the full names makes it work! Is there a reason why the rankfile option treats host names differently than the hostfile option? Thanks Jody On Mon, Aug 17, 2009 at 11:20 AM, Lenny Verkhovsky wrote: > Hi > This message means > that you are trying to use host "plankton"

Re: [OMPI users] rank file error: Rankfile claimed...

2009-08-17 Thread Lenny Verkhovsky
Hi This message means that you are trying to use host "plankton", that was not allocated via hostfile or hostlist. But according to the files and command line, everything seems fine. Can you try using "plankton.uzh.ch" hostname instead of "plankton". thanks Lenny. On Mon, Aug 17, 2009 at 10:36 AM,

[OMPI users] rank file error: Rankfile claimed...

2009-08-17 Thread jody
Hi When i use a rankfile, i get an error message which i don't understand: [jody@plankton tests]$ mpirun -np 3 -rf rankfile -hostfile testhosts ./HelloMPI -- Rankfile claimed host plankton that was not allocated or oversubscr

Re: [OMPI users] Help: How to accomplish processors affinity

2009-08-17 Thread Lenny Verkhovsky
Hi http://www.open-mpi.org/faq/?category=tuning#using-paffinity I am not familiar with this cluster, but in the FAQ ( see link above ) you can find an example of the rankfile. another simple example is the following: $cat rankfile rank 0=host1 slot=0 rank 1=host2 slot=0 rank 2=host3 slot=0 rank 3=h