On Tue, 2010-07-27 at 16:19 -0400, Gus Correa wrote:
> Hi Hugo, David, Jeff, Terry, Anton, list
>
> I suppose maybe we're guessing that somehow on Hugo's iMac
> MPI_DOUBLE_PRECISION may not have as many bytes as dp = kind(1.d0),
> hence the segmentation fault on MPI_Allreduce.
>
> Question:
>
>
Hi,
I have some performance issue on a parallel machine composed of nodes of 16
procs each. The application is launched on multiple of 16 procs for given
numbers of nodes.
I was told by people using MX MPI with this machine to attach a script to
mpiexec, which 'numactl' things, in order to make
On Tue, Jul 27, 2010 at 7:29 PM, Gus Correa wrote:
> Hi Cristobal
>
> Does it run only on the head node alone?
> (Fuego? Agua? Acatenango?)
> Try to put only the head node on the hostfile and execute with mpiexec.
>
--> i will try only with the head node, and post results back
> This may help so
I did and it runs now, but the result is wrong: outside is still 1.d0,
2.d0, 3.d0, 4.d0, 5.d0
How can I make sure to compile OpenMPI so that datatypes such as
mpi_double_precision behave as they "should"?
Are there flags during the OpenMPI building process or something?
Thanks,
--
Hugo Gagnon
Hi Cristobal
Does it run only on the head node alone?
(Fuego? Agua? Acatenango?)
Try to put only the head node on the hostfile and execute with mpiexec.
This may help sort out what is going on.
Hopefully it will run on the head node.
Also, do you have Infinband connecting the nodes?
The error me
i compiled with absolute path in case:
fcluster@agua:~$ /opt/openmpi-1.4.2/bin/mpicc testMPI/hello.c -o
testMPI/hola
fcluster@agua:~$ mpirun --hostfile myhostfile -np 5 testMPI/hola
[agua:03547] mca: base: component_find: unable to open
/opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing
Thanks Gus,
but i already had the paths
fcluster@agua:~$ echo $PATH
/opt/openmpi-1.4.2/bin:/opt/cfc/sge/bin/lx24-amd64:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
fcluster@agua:~$ echo $LD_LIBRARY_PATH
/opt/openmpi-1.4.2/lib:
fcluster@agua:~$
even weird, errors come s
Hi Cristobal
Try using the --prefix option of mpiexec.
"man mpiexec" is your friend!
Alternatively, append the OpenMPI directories to your
PATH *and* LD_LIBRARY_PATH on your .bashrc/.csrhc file
See this FAQ:
http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path
I hope it helps,
Gus
Hi,
Even when executing a hello world openmpi, i get this error, which is then
ignored.
fcluster@fuego:~$ mpirun --hostfile myhostfile -np 5 testMPI/hola
[agua:02357] mca: base: component_find: unable to open
/opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol, or
compiled for a
With this earlier failure do you know how many message may have been
transferred between the two processes? Is there a way to narrow this
down to a small piece of code? Do you have totalview or ddt at your
disposal?
--td
Brian Smith wrote:
Also, the application I'm having trouble with appe
Hi Hugo, David, Jeff, Terry, Anton, list
I suppose maybe we're guessing that somehow on Hugo's iMac
MPI_DOUBLE_PRECISION may not have as many bytes as dp = kind(1.d0),
hence the segmentation fault on MPI_Allreduce.
Question:
Is there a simple way to check the number of bytes associated to eac
Also, the application I'm having trouble with appears to work fine with
MVAPICH2 1.4.1, if that is any help.
-Brian
On Tue, 2010-07-27 at 10:48 -0400, Terry Dontje wrote:
> Can you try a simple point-to-point program?
>
> --td
>
> Brian Smith wrote:
> > After running on two processors across t
based on your output shown here, there is absolutely nothing wrong
(yet). Both processes are in the same function and do what they are
supposed to do.
However, I am fairly sure that the client process bt that you show is
already part of current_intracomm. Could you try to create a bt of the
proces
No, we really shouldn't. Having just fought with a program using usleep(1)
which was behaving even worse, working around this particular inability of the
Linux kernel development team to do something sane will only lead to more pain.
There are no good options, so the best option is to not try
This slides outside of my purview - I would suggest you post this question with
a different subject line specifically mentioning failure of intercomm_merge to
work so it attracts the attention of those with knowledge of that area.
On Jul 27, 2010, at 9:30 AM, Grzegorz Maj wrote:
> So now I hav
Try mpi_real8 for the type in allreduce
On 7/26/10, Hugo Gagnon wrote:
> Hello,
>
> When I compile and run this code snippet:
>
> 1 program test
> 2
> 3 use mpi
> 4
> 5 implicit none
> 6
> 7 integer :: ierr, nproc, myrank
> 8 integer, parameter :: d
So now I have a new question.
When I run my server and a lot of clients on the same machine,
everything looks fine.
But when I try to run the clients on several machines the most
frequent scenario is:
* server is stared on machine A
* X (= 1, 4, 10, ..) clients are started on machine B and they co
Hi, Terry,
I just ran through the entire gamut of OSU/OMB tests -- osu_bibw
osu_latency osu_multi_lat osu_bw osu_alltoall osu_mbw_mr osu_bcast -- on
various nodes on one of our clusters (at least two nodes per job) w/
version 1.4.2 and OFED 1.5 (executables and mpi compiled w/ gcc 4.4.2)
and haven
I appreciate your replies but my question has to do with the function
MPI_Allreduce of OpenMPI built on a Mac OSX 10.6 with ifort (intel
fortran compiler).
--
Hugo Gagnon
On Tue, 27 Jul 2010 13:23 +0100, "Anton Shterenlikht"
wrote:
> On Tue, Jul 27, 2010 at 08:11:39AM -0400, Jeff Squyres wrot
I am out of the office until 08/02/2010.
I will respond to your message when I return.
Note: This is an automated response to your message "users Digest, Vol
1642, Issue 1" sent on 7/27/10 9:32:11 AM.
This is the only notification you will receive while this person is away.
Can you try a simple point-to-point program?
--td
Brian Smith wrote:
After running on two processors across two nodes, the problem occurs
much earlier during execution:
(gdb) bt
#0 opal_sys_timer_get_cycles ()
at ../opal/include/opal/sys/amd64/timer.h:46
#1 opal_timer_base_get_cycles ()
at .
After running on two processors across two nodes, the problem occurs
much earlier during execution:
(gdb) bt
#0 opal_sys_timer_get_cycles ()
at ../opal/include/opal/sys/amd64/timer.h:46
#1 opal_timer_base_get_cycles ()
at ../opal/mca/timer/linux/timer_linux.h:31
#2 opal_progress () at runtime/o
Both 1.4.1 and 1.4.2 exhibit the same behaviors w/ OFED 1.5. It wasn't
OFED 1.4 after all (after some more digging around through our update
logs).
All of the ibv_*_pingpong tests appear to work correctly. I'll try
running a few more tests (np=2 over two nodes, some of the OSU
benchmarks, etc.)
On Tue, Jul 27, 2010 at 08:11:39AM -0400, Jeff Squyres wrote:
> On Jul 26, 2010, at 11:06 PM, Hugo Gagnon wrote:
>
> > 8 integer, parameter :: dp = kind(1.d0)
> > 9 real(kind=dp) :: inside(5), outside(5)
>
> I'm not a fortran expert -- is kind(1.d0) really double precision? A
On Tue, 2010-07-27 at 08:11 -0400, Jeff Squyres wrote:
> On Jul 26, 2010, at 11:06 PM, Hugo Gagnon wrote:
>
> > 8 integer, parameter :: dp = kind(1.d0)
> > 9 real(kind=dp) :: inside(5), outside(5)
>
> I'm not a fortran expert -- is kind(1.d0) really double precision? Accordin
On Jul 26, 2010, at 11:06 PM, Hugo Gagnon wrote:
> 8 integer, parameter :: dp = kind(1.d0)
> 9 real(kind=dp) :: inside(5), outside(5)
I'm not a fortran expert -- is kind(1.d0) really double precision? According
to http://gcc.gnu.org/onlinedocs/gcc-3.4.6/g77/Kind-Notation.htm
A clarification from your previous email, you had your code working with
OMPI 1.4.1 but an older version of OFED? Then you upgraded to OFED 1.4
and things stopped working? Sounds like your current system is set up
with OMPI 1.4.2 and OFED 1.5. Anyways, I am a little confused as to
when thing
Use what hostname returns - don't worry about IP addresses as we'll discover
them.
On Jul 26, 2010, at 10:45 PM, Philippe wrote:
> Thanks a lot!
>
> now, for the ev "OMPI_MCA_orte_nodes", what do I put exactly? our
> nodes have a short/long name (it's rhel 5.x, so the command hostname
> returns
Thanks a lot!
now, for the ev "OMPI_MCA_orte_nodes", what do I put exactly? our
nodes have a short/long name (it's rhel 5.x, so the command hostname
returns the long name) and at least 2 IP addresses.
p.
On Tue, Jul 27, 2010 at 12:06 AM, Ralph Castain wrote:
> Okay, fixed in r23499. Thanks agai
Okay, fixed in r23499. Thanks again...
On Jul 26, 2010, at 9:47 PM, Ralph Castain wrote:
> Doh - yes it should! I'll fix it right now.
>
> Thanks!
>
> On Jul 26, 2010, at 9:28 PM, Philippe wrote:
>
>> Ralph,
>>
>> i was able to test the generic module and it seems to be working.
>>
>> one q
30 matches
Mail list logo