Hi Yuping,
Maybe using multi-threads inside a socket, and MPI among sockets is better
choice for such NUMA platform.
Multi-threads can exploit the benefit of share memory, and MPI can
alleviate the cost of non-uniform memory access.
regards,
Zehan
On Tue, Jun 17, 2014 at 6:19 AM, Yuping Sun
Hi Ralph:
Is the following correct command to you:
mpirun -np 32 --bysocket --bycore
~ysun/Codes/NASA/fun3d-12.3-66687/Mpi/FUN3D_90/nodet_mpi
--time_timestep_loop --animation_freq -1
I run above command, still do not improve. Would you give me a detailed command
with options?
Thank you.
Be
Ok that works
Thanks!!
> Do you need the vampire support in your build? If not, you could add this
> to configure.
> --disable-vt
>
>>-Original Message-
>>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of
>>jcabe...@computacion.cs.cinvestav.mx
>>Sent: Monday, June 16, 2014 1:4
Well, for one, there is never any guarantee of linear scaling with the number
of procs - that is very application dependent. You can actually see performance
decrease with number of procs if the application doesn't know how to exploit
them.
One thing that stands out is your mapping and binding
Dear All:
I bought a 64 core workstation and installed NASA fun3d with open mpi 1.6.5.
Then I started to test run fun3d using 16, 32, 48 cores. However the
performance of the fun3d run is bad. I got data below:
the run command is (it is for 32 core as an example)
mpiexec -np 32 --bysocket --bi
Do you need the vampire support in your build? If not, you could add this to
configure.
--disable-vt
>-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of
>jcabe...@computacion.cs.cinvestav.mx
>Sent: Monday, June 16, 2014 1:40 PM
>To: us...@open-mpi.org
>Sub
Hi all:
I'm having trouble to compile OMPI from trunk svn with the new 6.0 nvidia
SDK because deprecated cuptiActivityEnqueueBuffer
this is the problem:
CC libvt_la-vt_cupti_activity.lo
CC libvt_la-vt_iowrap_helper.lo
CC libvt_la-vt_libwrap.lo
CC libvt_la-vt_mallo
Just to wrap this up for the user list: this has now been fixed and added to
1.8.2 in the nightly tarball. The problem proved to be an edge case when
partial allocations were combined with coprocessor existence (hit a slightly
different code path).
On Jun 12, 2014, at 9:04 AM, Dan Dietz wrote