date:20100727

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-27 Thread Terry Frankcombe

On Tue, 2010-07-27 at 16:19 -0400, Gus Correa wrote: > Hi Hugo, David, Jeff, Terry, Anton, list > > I suppose maybe we're guessing that somehow on Hugo's iMac > MPI_DOUBLE_PRECISION may not have as many bytes as dp = kind(1.d0), > hence the segmentation fault on MPI_Allreduce. > > Question: > >

[OMPI users] OpenMPI providing rank?

2010-07-27 Thread Yves Caniou

Hi, I have some performance issue on a parallel machine composed of nodes of 16 procs each. The application is launched on multiple of 16 procs for given numbers of nodes. I was told by people using MX MPI with this machine to attach a script to mpiexec, which 'numactl' things, in order to make

Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-27 Thread Cristobal Navarro

On Tue, Jul 27, 2010 at 7:29 PM, Gus Correa wrote: > Hi Cristobal > > Does it run only on the head node alone? > (Fuego? Agua? Acatenango?) > Try to put only the head node on the hostfile and execute with mpiexec. > --> i will try only with the head node, and post results back > This may help so

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-27 Thread Hugo Gagnon

I did and it runs now, but the result is wrong: outside is still 1.d0, 2.d0, 3.d0, 4.d0, 5.d0 How can I make sure to compile OpenMPI so that datatypes such as mpi_double_precision behave as they "should"? Are there flags during the OpenMPI building process or something? Thanks, -- Hugo Gagnon

Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-27 Thread Gus Correa

Hi Cristobal Does it run only on the head node alone? (Fuego? Agua? Acatenango?) Try to put only the head node on the hostfile and execute with mpiexec. This may help sort out what is going on. Hopefully it will run on the head node. Also, do you have Infinband connecting the nodes? The error me

Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-27 Thread Cristobal Navarro

i compiled with absolute path in case: fcluster@agua:~$ /opt/openmpi-1.4.2/bin/mpicc testMPI/hello.c -o testMPI/hola fcluster@agua:~$ mpirun --hostfile myhostfile -np 5 testMPI/hola [agua:03547] mca: base: component_find: unable to open /opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing

Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-27 Thread Cristobal Navarro

Thanks Gus, but i already had the paths fcluster@agua:~$ echo $PATH /opt/openmpi-1.4.2/bin:/opt/cfc/sge/bin/lx24-amd64:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games fcluster@agua:~$ echo $LD_LIBRARY_PATH /opt/openmpi-1.4.2/lib: fcluster@agua:~$ even weird, errors come s

Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-27 Thread Gus Correa

Hi Cristobal Try using the --prefix option of mpiexec. "man mpiexec" is your friend! Alternatively, append the OpenMPI directories to your PATH *and* LD_LIBRARY_PATH on your .bashrc/.csrhc file See this FAQ: http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path I hope it helps, Gus

[OMPI users] openMPI shared with NFS, but says different version

2010-07-27 Thread Cristobal Navarro

Hi, Even when executing a hello world openmpi, i get this error, which is then ignored. fcluster@fuego:~$ mpirun --hostfile myhostfile -np 5 testMPI/hola [agua:02357] mca: base: component_find: unable to open /opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol, or compiled for a

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

2010-07-27 Thread Terry Dontje

With this earlier failure do you know how many message may have been transferred between the two processes? Is there a way to narrow this down to a small piece of code? Do you have totalview or ddt at your disposal? --td Brian Smith wrote: Also, the application I'm having trouble with appe

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-27 Thread Gus Correa

Hi Hugo, David, Jeff, Terry, Anton, list I suppose maybe we're guessing that somehow on Hugo's iMac MPI_DOUBLE_PRECISION may not have as many bytes as dp = kind(1.d0), hence the segmentation fault on MPI_Allreduce. Question: Is there a simple way to check the number of bytes associated to eac

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

2010-07-27 Thread Brian Smith

Also, the application I'm having trouble with appears to work fine with MVAPICH2 1.4.1, if that is any help. -Brian On Tue, 2010-07-27 at 10:48 -0400, Terry Dontje wrote: > Can you try a simple point-to-point program? > > --td > > Brian Smith wrote: > > After running on two processors across t

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-27 Thread Edgar Gabriel

based on your output shown here, there is absolutely nothing wrong (yet). Both processes are in the same function and do what they are supposed to do. However, I am fairly sure that the client process bt that you show is already part of current_intracomm. Could you try to create a bt of the proces

Re: [OMPI users] Do MPI calls ever sleep?

2010-07-27 Thread Barrett, Brian W

No, we really shouldn't. Having just fought with a program using usleep(1) which was behaving even worse, working around this particular inability of the Linux kernel development team to do something sane will only lead to more pain. There are no good options, so the best option is to not try

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-27 Thread Ralph Castain

This slides outside of my purview - I would suggest you post this question with a different subject line specifically mentioning failure of intercomm_merge to work so it attracts the attention of those with knowledge of that area. On Jul 27, 2010, at 9:30 AM, Grzegorz Maj wrote: > So now I hav

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-27 Thread David Zhang

Try mpi_real8 for the type in allreduce On 7/26/10, Hugo Gagnon wrote: > Hello, > > When I compile and run this code snippet: > > 1 program test > 2 > 3 use mpi > 4 > 5 implicit none > 6 > 7 integer :: ierr, nproc, myrank > 8 integer, parameter :: d

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-27 Thread Grzegorz Maj

So now I have a new question. When I run my server and a lot of clients on the same machine, everything looks fine. But when I try to run the clients on several machines the most frequent scenario is: * server is stared on machine A * X (= 1, 4, 10, ..) clients are started on machine B and they co

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

2010-07-27 Thread Brian Smith

Hi, Terry, I just ran through the entire gamut of OSU/OMB tests -- osu_bibw osu_latency osu_multi_lat osu_bw osu_alltoall osu_mbw_mr osu_bcast -- on various nodes on one of our clusters (at least two nodes per job) w/ version 1.4.2 and OFED 1.5 (executables and mpi compiled w/ gcc 4.4.2) and haven

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-27 Thread Hugo Gagnon

I appreciate your replies but my question has to do with the function MPI_Allreduce of OpenMPI built on a Mac OSX 10.6 with ifort (intel fortran compiler). -- Hugo Gagnon On Tue, 27 Jul 2010 13:23 +0100, "Anton Shterenlikht" wrote: > On Tue, Jul 27, 2010 at 08:11:39AM -0400, Jeff Squyres wrot

[OMPI users] AUTO: Jeffrey M Ceason is out of the office. (returning 08/02/2010)

2010-07-27 Thread Jeffrey M Ceason

I am out of the office until 08/02/2010. I will respond to your message when I return. Note: This is an automated response to your message "users Digest, Vol 1642, Issue 1" sent on 7/27/10 9:32:11 AM. This is the only notification you will receive while this person is away.

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

2010-07-27 Thread Terry Dontje

Can you try a simple point-to-point program? --td Brian Smith wrote: After running on two processors across two nodes, the problem occurs much earlier during execution: (gdb) bt #0 opal_sys_timer_get_cycles () at ../opal/include/opal/sys/amd64/timer.h:46 #1 opal_timer_base_get_cycles () at .

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

2010-07-27 Thread Brian Smith

After running on two processors across two nodes, the problem occurs much earlier during execution: (gdb) bt #0 opal_sys_timer_get_cycles () at ../opal/include/opal/sys/amd64/timer.h:46 #1 opal_timer_base_get_cycles () at ../opal/mca/timer/linux/timer_linux.h:31 #2 opal_progress () at runtime/o

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

2010-07-27 Thread Brian Smith

Both 1.4.1 and 1.4.2 exhibit the same behaviors w/ OFED 1.5. It wasn't OFED 1.4 after all (after some more digging around through our update logs). All of the ibv_*_pingpong tests appear to work correctly. I'll try running a few more tests (np=2 over two nodes, some of the OSU benchmarks, etc.)

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-27 Thread Anton Shterenlikht

On Tue, Jul 27, 2010 at 08:11:39AM -0400, Jeff Squyres wrote: > On Jul 26, 2010, at 11:06 PM, Hugo Gagnon wrote: > > > 8 integer, parameter :: dp = kind(1.d0) > > 9 real(kind=dp) :: inside(5), outside(5) > > I'm not a fortran expert -- is kind(1.d0) really double precision? A

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-27 Thread Terry Frankcombe

On Tue, 2010-07-27 at 08:11 -0400, Jeff Squyres wrote: > On Jul 26, 2010, at 11:06 PM, Hugo Gagnon wrote: > > > 8 integer, parameter :: dp = kind(1.d0) > > 9 real(kind=dp) :: inside(5), outside(5) > > I'm not a fortran expert -- is kind(1.d0) really double precision? Accordin

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-27 Thread Jeff Squyres

On Jul 26, 2010, at 11:06 PM, Hugo Gagnon wrote: > 8 integer, parameter :: dp = kind(1.d0) > 9 real(kind=dp) :: inside(5), outside(5) I'm not a fortran expert -- is kind(1.d0) really double precision? According to http://gcc.gnu.org/onlinedocs/gcc-3.4.6/g77/Kind-Notation.htm

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

2010-07-27 Thread Terry Dontje

A clarification from your previous email, you had your code working with OMPI 1.4.1 but an older version of OFED? Then you upgraded to OFED 1.4 and things stopped working? Sounds like your current system is set up with OMPI 1.4.2 and OFED 1.5. Anyways, I am a little confused as to when thing

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-27 Thread Ralph Castain

Use what hostname returns - don't worry about IP addresses as we'll discover them. On Jul 26, 2010, at 10:45 PM, Philippe wrote: > Thanks a lot! > > now, for the ev "OMPI_MCA_orte_nodes", what do I put exactly? our > nodes have a short/long name (it's rhel 5.x, so the command hostname > returns

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-27 Thread Philippe

Thanks a lot! now, for the ev "OMPI_MCA_orte_nodes", what do I put exactly? our nodes have a short/long name (it's rhel 5.x, so the command hostname returns the long name) and at least 2 IP addresses. p. On Tue, Jul 27, 2010 at 12:06 AM, Ralph Castain wrote: > Okay, fixed in r23499. Thanks agai

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-27 Thread Ralph Castain

Okay, fixed in r23499. Thanks again... On Jul 26, 2010, at 9:47 PM, Ralph Castain wrote: > Doh - yes it should! I'll fix it right now. > > Thanks! > > On Jul 26, 2010, at 9:28 PM, Philippe wrote: > >> Ralph, >> >> i was able to test the generic module and it seems to be working. >> >> one q

Re: [OMPI users] MPI_Allreduce on local machine

[OMPI users] OpenMPI providing rank?

Re: [OMPI users] openMPI shared with NFS, but says different version

Re: [OMPI users] MPI_Allreduce on local machine

Re: [OMPI users] openMPI shared with NFS, but says different version

Re: [OMPI users] openMPI shared with NFS, but says different version

Re: [OMPI users] openMPI shared with NFS, but says different version

Re: [OMPI users] openMPI shared with NFS, but says different version

[OMPI users] openMPI shared with NFS, but says different version

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

Re: [OMPI users] MPI_Allreduce on local machine

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Do MPI calls ever sleep?

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] MPI_Allreduce on local machine

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

Re: [OMPI users] MPI_Allreduce on local machine

[OMPI users] AUTO: Jeffrey M Ceason is out of the office. (returning 08/02/2010)

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

Re: [OMPI users] MPI_Allreduce on local machine

Re: [OMPI users] MPI_Allreduce on local machine

Re: [OMPI users] MPI_Allreduce on local machine

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

30 matches

Site Navigation

Mail list logo

Footer information