Re: [OMPI users] Hybrid OpenMPI / OpenMP run pins OpenMP threads to a single core

2010-07-28 Thread Ralph Castain
Something doesn't add up - the default for ompi is to -not- bind. Check your default mca param file and your environment. Do you have any mca params set in them? On Jul 28, 2010, at 9:40 PM, David Akin wrote: > Here's the exact command I'm running when all threads *are* pinned to > a single co

Re: [OMPI users] Hybrid OpenMPI / OpenMP run pins OpenMP threads to a single core

2010-07-28 Thread David Akin
Here's the exact command I'm running when all threads *are* pinned to a single core: /usr/mpi/gcc/openmpi-1.4-qlc/bin/mpirun -host c005,c006 -np 2 OMP_NUM_THREADS=4 hybrid4.gcc Can anyone verify they have the same issue? On Wed, Jul 28, 2010 at 7:52 PM, Ralph Castain wrote: > How are you runnin

Re: [OMPI users] Hybrid OpenMPI / OpenMP run pins OpenMP threads to a single core

2010-07-28 Thread Ralph Castain
How are you running it when the threads are all on one core? If you are specifying --bind-to-core, then of course all the threads will be on one core since we bind the process (not the thread). If you are specifying -mca mpi_paffinity_alone 1, then the same behavior results. Generally, if you w

[OMPI users] Hybrid OpenMPI / OpenMP run pins OpenMP threads to a single core

2010-07-28 Thread David Akin
All, I'm trying to get the OpenMP portion of the code below to run multicore on a couple of 8 core nodes. Good news: multiple threads are being spawned on each node in the run. Bad news: each of the threads only runs on a single core, leaving 7 cores basically idle. Sorta good news: if I provide a

[OMPI users] Hybrid OpenMPI / OpenMP run pins OpenMP threads to a single core

2010-07-28 Thread David Akin
All, I'm trying to get the OpenMP portion of the code below to run multicore on a couple of 8 core nodes. Good news: Multiple threads are being spawned on each node in the run. Bad news: Each of the threads only runs on a single core, leaving 7 cores basically idle. Sorta good news: If I provide a

[OMPI users] MPI broadcast test fails only when I run within a torque job

2010-07-28 Thread Rahul Nabar
I'm not sure if this is a torque issue or an MPI issue. If I log in to a compute-node and run the standard mpi broadcast test it returns no error but if I run it through PBS/Torque I get an error (see below) The nodes that return the error are fairly random. Even the same set of nodes will run a t

Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-28 Thread Gus Correa
Cristobal Navarro wrote: Gus my kernel for all nodes is this one: Linux 2.6.32-22-server #36-Ubuntu SMP Thu Jun 3 20:38:33 UTC 2010 x86_64 GNU/Linux Kernel is not my league. However, it would be great if somebody clarified for good these issues with Nehalem/Westmere, HT, shared memory and w

[OMPI users] Alignment of Fortran variables with ifort

2010-07-28 Thread Martin Siegert
Hi, I am creating a new thread (was: MPI_Allreduce on local machine). On Wed, Jul 28, 2010 at 05:07:29PM -0400, Gus Correa wrote: > Still, the alignment under Intel may or may not be right. > And this may or may not explain the errors that Hugo has got. > > FYI, the ompi_info from my OpenMPI 1.3.

Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-28 Thread Cristobal Navarro
Gus my kernel for all nodes is this one: Linux 2.6.32-22-server #36-Ubuntu SMP Thu Jun 3 20:38:33 UTC 2010 x86_64 GNU/Linux at least for the moment i will use this configuration, at least for deveplopment/testing of the parallel programs. lag is minimum :) whenever i get another kernel update, i

Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-28 Thread Gus Correa
Hi Cristobal Please, read my answer (way down the message) below. Cristobal Navarro wrote: On Wed, Jul 28, 2010 at 3:28 PM, Gus Correa > wrote: Hi Cristobal Cristobal Navarro wrote: On Wed, Jul 28, 2010 at 11:09 AM, Gus Correa mailt

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-28 Thread Hugo Gagnon
I also get 8 from "call MPI_Type_size(MPI_DOUBLE_PRECISION, size, mpierr)", but really I don't think this is the issue anymore. I mean I checked on my school cluster where OpenMPI has also been compiled with the intel64 compilers and "Fort dbl prec size:" also returns 4 but unlike on my Mac the cod

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-28 Thread Gus Correa
Hi All Martin Siegert wrote: On Wed, Jul 28, 2010 at 01:05:52PM -0700, Martin Siegert wrote: On Wed, Jul 28, 2010 at 11:19:43AM -0400, Gus Correa wrote: Hugo Gagnon wrote: Hi Gus, Ompi_info --all lists its info regarding fortran right after C. In my case: Fort real size: 4

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-28 Thread Martin Siegert
On Wed, Jul 28, 2010 at 01:05:52PM -0700, Martin Siegert wrote: > On Wed, Jul 28, 2010 at 11:19:43AM -0400, Gus Correa wrote: > > Hugo Gagnon wrote: > >> Hi Gus, > >> Ompi_info --all lists its info regarding fortran right after C. In my > >> case: > >> Fort real size: 4 > >> Fort

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-28 Thread Martin Siegert
On Wed, Jul 28, 2010 at 11:19:43AM -0400, Gus Correa wrote: > Hugo Gagnon wrote: >> Hi Gus, >> Ompi_info --all lists its info regarding fortran right after C. In my >> case: >> Fort real size: 4 >> Fort real4 size: 4 >> Fort real8 size: 8 >> Fort real16 size: 16

Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-28 Thread Cristobal Navarro
On Wed, Jul 28, 2010 at 3:28 PM, Gus Correa wrote: > Hi Cristobal > > Cristobal Navarro wrote: > >> >> >> On Wed, Jul 28, 2010 at 11:09 AM, Gus Correa > g...@ldeo.columbia.edu>> wrote: >> >>Hi Cristobal >> >>In case you are not using full path name for mpiexec/mpirun, >>what does "whi

Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-28 Thread Gus Correa
Hi Cristobal Cristobal Navarro wrote: On Wed, Jul 28, 2010 at 11:09 AM, Gus Correa > wrote: Hi Cristobal In case you are not using full path name for mpiexec/mpirun, what does "which mpirun" say? --> $which mpirun /opt/openmpi-1.4.2 O

Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-28 Thread Cristobal Navarro
to clear things, i still can do a hello world on all 16 threads, but a few more repetitions of the example and it kernel crashes :( fcluster@agua:~$ mpirun --hostfile localhostfile -np 16 testMPI/hola Process 0 on agua out of 16 Process 2 on agua out of 16 Process 14 on agua out of 16 Process 8 o

Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-28 Thread Cristobal Navarro
On Wed, Jul 28, 2010 at 11:09 AM, Gus Correa wrote: > Hi Cristobal > > In case you are not using full path name for mpiexec/mpirun, > what does "which mpirun" say? > --> $which mpirun /opt/openmpi-1.4.2 > > Often times this is a source of confusion, old versions may > be first on the PATH

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

2010-07-28 Thread Brian Smith
I've used the TCP btl and it works fine. Its only with openib btl that I have issues. I have a set of nodes that uses qib and psm. This mtl works fine also. I'll try adjusting rendezvous limit and message settings as well as the collective algorithm options and see if that helps. Many thanks

Re: [OMPI users] MPIRUN Error on Mac pro i7 laptop and linux desktop

2010-07-28 Thread christophe petit
Thanks for your answers, the execution of this parallel program works fine at my work, but we used MPICH2. I thought this will run with OPEN-MPI too. Here is the f90 source where MPI_CART_SHIFT is called : program heat !**

Re: [OMPI users] MPIRUN Error on Mac pro i7 laptop and linux desktop

2010-07-28 Thread Jeff Squyres
According to the error message (especially since it's consistent across 2 different platforms), it looks like you have an error in your application. Open MPI says that you're using an invalid communicator when calling MPI_Cart_shift. "Invalid" probably means that it's not a Cartesian communic

Re: [OMPI users] MPIRUN Error on Mac pro i7 laptop and linux desktop

2010-07-28 Thread Ralph Castain
First thing is to ensure you are getting the version of OMPI that you expect. Both the Mac and Debian come with their own pre-installed versions, so you have to ensure that PATH and LD_LIBRARY_PATH are correctly pointing to the version you installed and compiled against. On Jul 28, 2010, at 10

[OMPI users] MPIRUN Error on Mac pro i7 laptop and linux desktop

2010-07-28 Thread christophe petit
hello, i have a problem concerning the execution of a f90 program (explicitPar) compiled with openmpi-1.4.2. I get nearly the same error on my debian desktop ( AMD Phenom(tm) 9550 Quad-Core Processor) and my mac pro i7 laptop : on mac pro i7 : $ mpiexec -np 2 explicitPar [macbook-pro-de-fab.live

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-28 Thread Åke Sandgren
On Wed, 2010-07-28 at 11:48 -0400, Gus Correa wrote: > Hi Hugo, Jeff, list > > Hugo: I think David Zhang's suggestion was to use > MPI_REAL8 not MPI_REAL, instead of MPI_DOUBLE_PRECISION in your > MPI_Allreduce call. > > Still, to me it looks like OpenMPI is making double precision 4-byte > long

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-28 Thread Hugo Gagnon
Here they are. -- Hugo Gagnon On Wed, 28 Jul 2010 12:01 -0400, "Jeff Squyres" wrote: > On Jul 28, 2010, at 11:55 AM, Gus Correa wrote: > > > I surely can send you the logs, but they're big. > > Off the list perhaps? > > If they're still big when compressed, sure, send them to me off list. >

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-28 Thread Jeff Squyres
On Jul 28, 2010, at 11:55 AM, Gus Correa wrote: > I surely can send you the logs, but they're big. > Off the list perhaps? If they're still big when compressed, sure, send them to me off list. But I think I'd be more interested to see Hugo's logs. :-) -- Jeff Squyres jsquy...@cisco.com For co

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-28 Thread Gus Correa
Hi Jeff I surely can send you the logs, but they're big. Off the list perhaps? Thanks, Gus Jeff Squyres wrote: On Jul 28, 2010, at 11:19 AM, Gus Correa wrote: Ompi_info --all lists its info regarding fortran right after C. In my Ummm right... I should know that. I wrote ompi_info, aft

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-28 Thread Gus Correa
Hi Hugo, Jeff, list Hugo: I think David Zhang's suggestion was to use MPI_REAL8 not MPI_REAL, instead of MPI_DOUBLE_PRECISION in your MPI_Allreduce call. Still, to me it looks like OpenMPI is making double precision 4-byte long, which shorter than I expected it be (8 bytes), at least when look

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-28 Thread Jeff Squyres
On Jul 28, 2010, at 11:19 AM, Gus Correa wrote: > > Ompi_info --all lists its info regarding fortran right after C. In my Ummm right... I should know that. I wrote ompi_info, after all. :-) I ran "ompi_info -all | grep -i fortran" and didn't see the fortran info, and I forgot that I put

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-28 Thread Gus Correa
Hugo Gagnon wrote: Hi Gus, Ompi_info --all lists its info regarding fortran right after C. In my case: Fort real size: 4 Fort real4 size: 4 Fort real8 size: 8 Fort real16 size: 16 Fort dbl prec size: 4 Does it make any sense to you? Hi Hugo No, dbl pre

Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-28 Thread Gus Correa
Hi Cristobal In case you are not using full path name for mpiexec/mpirun, what does "which mpirun" say? Often times this is a source of confusion, old versions may be first on the PATH. Gus Cristobal Navarro wrote: On Tue, Jul 27, 2010 at 7:29 PM, Gus Correa >

Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-28 Thread Cristobal Navarro
yes, somehow after the second install, the installlation is consistent. im only running into an issue, might be mpi im not sure. these nodes, each one have 8 phisical procesors (2xIntel Xeon quad core), and 16 virtual ones, btw i have ubuntu server 64bit 10.04 instaled on these nodes. the proble

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-28 Thread Hugo Gagnon
I mean to write: call mpi_allreduce(inside, outside, 5,mpi_real, mpi_double_precision, mpi_comm_world, ierr) -- Hugo Gagnon On Wed, 28 Jul 2010 09:33 -0400, "Hugo Gagnon" wrote: > And how do I know how big my data buffer is? I ran MPI_TYPE_EXTENT of > And how do I know how big my data buffer

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-28 Thread Hugo Gagnon
I installed with: ./configure --prefix=/opt/openmpi CC=icc CXX=icpc F77=ifort FC=ifort make all install I would gladly give you a corefile but I have no idea on to produce one, I'm just an end user... -- Hugo Gagnon On Wed, 28 Jul 2010 08:57 -0400, "Jeff Squyres" wrote: > I don't have the i

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-28 Thread Hugo Gagnon
And how do I know how big my data buffer is? I ran MPI_TYPE_EXTENT of And how do I know how big my data buffer is? I ran MPI_TYPE_EXTENT of MPI_DOUBLE_PRECISION and the result was 8. So I changed my program to: 1 program test 2 3 use mpi 4 5 implicit none 6 7

Re: [OMPI users] OpenMPI providing rank?

2010-07-28 Thread Yves Caniou
Le Wednesday 28 July 2010 15:05:28, vous avez écrit : > I am confused. I thought all you wanted to do is report out the binding of > the process - yes? Are you trying to set the affinity bindings yourself? > > If the latter, then your script doesn't do anything that mpirun wouldn't > do, and doesn'

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-28 Thread Jeff Squyres
I don't have the intel compilers on my Mac, but I'm unable to replicate this issue on Linux with the intel compilers v11.0. Can you get a corefile to see a backtrace where it died in Open MPI's allreduce? How exactly did you configure your Open MPI, and how exactly did you compile / run your sa

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-28 Thread Edgar Gabriel
hm, this looks actually correct. The question now basically is, why the intermediate hand-shake by the processes with rank 0 on the inter-communicator is not finishing. I am wandering whether this could be related to a problem reported in another thread (Processes stuck after MPI_Waitall() in 1.4.1

Re: [OMPI users] OpenMPI providing rank?

2010-07-28 Thread Yves Caniou
Le Wednesday 28 July 2010 11:34:13 Ralph Castain, vous avez écrit : > On Jul 27, 2010, at 11:18 PM, Yves Caniou wrote: > > Le Wednesday 28 July 2010 06:03:21 Nysal Jan, vous avez écrit : > >> OMPI_COMM_WORLD_RANK can be used to get the MPI rank. For other > >> environment variables - > >> http://ww

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-28 Thread Jeff Squyres
On Jul 27, 2010, at 4:19 PM, Gus Correa wrote: > Is there a simple way to check the number of bytes associated to each > MPI basic type of OpenMPI on a specific machine (or machine+compiler)? > > Something that would come out easily, say, from ompi_info? Not via ompi_info, but the MPI function M

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-28 Thread Jeff Squyres
On Jul 27, 2010, at 11:21 AM, Hugo Gagnon wrote: > I appreciate your replies but my question has to do with the function > MPI_Allreduce of OpenMPI built on a Mac OSX 10.6 with ifort (intel > fortran compiler). The implication I was going for was that if you were using MPI_DOUBLE_PRECISION with

Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-28 Thread Jeff Squyres
This issue is usually caused by installing one version of Open MPI over an older version: http://www.open-mpi.org/faq/?category=building#install-overwrite On Jul 27, 2010, at 10:35 PM, Cristobal Navarro wrote: > > On Tue, Jul 27, 2010 at 7:29 PM, Gus Correa wrote: > Hi Cristobal > > Doe

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

2010-07-28 Thread Terry Dontje
Here are a couple other suggestions: 1. Have you tried your code with using the TCP btl just to make sure this might not be a general algorithm issue with the collective? 2. While using the openib btl you may want to try things with rdma turned off by using the following parameters to mpiru

Re: [OMPI users] OpenMPI providing rank?

2010-07-28 Thread Ralph Castain
On Jul 27, 2010, at 11:18 PM, Yves Caniou wrote: > Le Wednesday 28 July 2010 06:03:21 Nysal Jan, vous avez écrit : >> OMPI_COMM_WORLD_RANK can be used to get the MPI rank. For other environment >> variables - >> http://www.open-mpi.org/faq/?category=running#mpi-environmental-variables > > Are pr

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-28 Thread Grzegorz Maj
I've attached gdb to the client which has just connected to the grid. Its bt is almost exactly the same as the server's one: #0 0x428066d7 in sched_yield () from /lib/libc.so.6 #1 0x00933cbf in opal_progress () at ../../opal/runtime/opal_progress.c:220 #2 0x00d460b8 in opal_condition_wait (c=0xd

Re: [OMPI users] OpenMPI providing rank?

2010-07-28 Thread Yves Caniou
Le Wednesday 28 July 2010 06:03:21 Nysal Jan, vous avez écrit : > OMPI_COMM_WORLD_RANK can be used to get the MPI rank. For other environment > variables - > http://www.open-mpi.org/faq/?category=running#mpi-environmental-variables Are processes affected to nodes sequentially, so that I can get th

Re: [OMPI users] OpenMPI providing rank?

2010-07-28 Thread Nysal Jan
OMPI_COMM_WORLD_RANK can be used to get the MPI rank. For other environment variables - http://www.open-mpi.org/faq/?category=running#mpi-environmental-variables For processor affinity see this FAQ entry - http://www.open-mpi.org/faq/?category=all#using-paffinity --Nysal On Wed, Jul 28, 2010 at 9