I think this issue is now resolved and thanks everybody for your help. I
certainly learnt a lot!
For the first case you describe, as OPENMPI is now, the call sequence
from fortran is
mpi_comm_rank -> MPI_Comm_rank -> PMPI_Comm_rank
For the second case, as MPICH is now, its
mpi_comm_rank ->
Hi George,
- "George Bosilca" wrote:
> On Dec 5, 2008, at 03:16 , Anthony Chan wrote:
>
> > void mpi_comm_rank_(MPI_Comm *comm, int *rank, int *info) {
> >printf("mpi_comm_rank call successfully intercepted\n");
> >*info = PMPI_Comm_rank(comm,rank);
> > }
>
> Unfortunately this exa
Hi Nick,
- "Nick Wright" wrote:
> For the first case you describe, as OPENMPI is now, the call sequence
>
> from fortran is
>
> mpi_comm_rank -> MPI_Comm_rank -> PMPI_Comm_rank
>
> For the second case, as MPICH is now, its
>
> mpi_comm_rank -> PMPI_Comm_rank
>
AFAIK, all known/popular
Hi Nick,
- "Nick Wright" wrote:
> Hi Antony
>
> That will work yes, but its not portable to other MPI's that do
> implement the profiling layer correctly unfortunately.
I guess I must have missed something here. What is not portable ?
>
> I guess we will just need to detect that we are
After spending few hours pondering about this problem, we came to the
conclusion that the best approach is to keep what we had before (i.e.
the original approach). This means I'll undo my patch in the trunk,
and not change the behavior on the next releases (1.3 and 1.2.9). This
approach, wh
The reason i'd like to disable these eager buffers is to help detect the
deadlock better. I would not run with this for a normal run but it
would be useful for debugging. If the deadlock is indeed due to our
code then disabling any shared buffers or eager sends would make that
deadlock reprod
OpenMPI has differnt eager limits for all the network types, on your
system run:
ompi_info --param btl all
and look for the eager_limits
You can set these values to 0 using the syntax I showed you before.
That would disable eager messages.
There might be a better way to disable eager messag
On Dec 5, 2008, at 03:16 , Anthony Chan wrote:
void mpi_comm_rank_(MPI_Comm *comm, int *rank, int *info) {
printf("mpi_comm_rank call successfully intercepted\n");
*info = PMPI_Comm_rank(comm,rank);
}
Unfortunately this example is not correct. The real Fortran prototype
for the MPI_Com
Thank you for this info. I should add that our code tends to post a lot
of sends prior to the other side posting receives. This causes a lot of
unexpected messages to exist. Our code explicitly matches up all tags
and processors (that is we do not use MPI wild cards). If we had a dead
lock
When ever this happens we found the code to have a deadlock. users
never saw it until they cross the eager->roundevous threshold.
Yes you can disable shared memory with:
mpirun --mca btl ^sm
Or you can try increasing the eager limit.
ompi_info --param btl sm
MCA btl: parameter "btl_sm_eage
On Dec 5, 2008, at 12:22 PM, Justin wrote:
Does OpenMPI have any known deadlocks that might be causing our
deadlocks?
Known deadlocks, no. We are assisting a customer, however, with a
deadlock that occurs in IMB Alltoall (and some other IMB tests) when
using 128 hosts and the MX BTL. We h
Brian
Sorry I picked the wrong word there. I guess this is more complicated
than I thought it was.
For the first case you describe, as OPENMPI is now, the call sequence
from fortran is
mpi_comm_rank -> MPI_Comm_rank -> PMPI_Comm_rank
For the second case, as MPICH is now, its
mpi_comm_rank
Terry Frankcombe wrote:
> Isn't it up to the OS scheduler what gets run where?
I was under the impression that the processor affinity API was designed
to let the OS (at least Linux) know how a given task preferred to be
bound in terms of the system topology.
--
V. Ram
v_r_...@fastmail.fm
--
Nick -
I think you have an incorrect deffinition of "correctly" :). According to
the MPI standard, an MPI implementation is free to either layer language
bindings (and only allow profiling at the lowest layer) or not layer the
language bindings (and require profiling libraries intercept each
Ralph Castain wrote:
> Thanks - yes, that helps. Can you do add --display-map to you cmd
> line? That will tell us what mpirun thinks it is doing.
The output from display map is below. Note that I've sanitized a few
items, but nothing relevant to this:
[granite:29685] Map for job: 1 Gener
I hope you are aware, that *many* tools and application actually profile
the fortran MPI layer by intercepting the C function calls. This allows
them to not have to deal with f2c translation of MPI objects and not
worry about the name mangling issue. Would there be a way to have both
options e
actually I am wondering whether my previous statement was correct. If
you do not intercept the fortran MPI call, than it still goes to the C
MPI call, which you can intercept. Only if you intercept the fortran MPI
call we do not call the C MPI but the C PMPI call, correct? So in
theory, it coul
On Dec 5, 2008, at 12:22 PM, Edgar Gabriel wrote:
I hope you are aware, that *many* tools and application actually
profile the fortran MPI layer by intercepting the C function calls.
This allows them to not have to deal with f2c translation of MPI
objects and not worry about the name mangli
On Dec 5, 2008, at 11:29 AM, Nick Wright wrote:
I think we can just look at OPEN_MPI as you say and then
OMPI_MAJOR_VERSION, OMPI_MINOR_VERSION & OMPI_RELEASE_VERSION
from mpi.h and if version is less than 1.2.9 implement a work around
as Antony suggested. Its not the most elegant solution b
Hi,
We are currently using OpenMPI 1.3 on Ranger for large processor jobs
(8K+). Our code appears to be occasionally deadlocking at random within
point to point communication (see stacktrace below). This code has been
tested on many different MPI versions and as far as we know it does not
c
George,
I hope you are aware, that *many* tools and application actually profile
the fortran MPI layer by intercepting the C function calls. This allows
them to not have to deal with f2c translation of MPI objects and not
worry about the name mangling issue. Would there be a way to have both
I think we can just look at OPEN_MPI as you say and then
OMPI_MAJOR_VERSION, OMPI_MINOR_VERSION & OMPI_RELEASE_VERSION
from mpi.h and if version is less than 1.2.9 implement a work around as
Antony suggested. Its not the most elegant solution but it will work I
think?
Nick.
Jeff Squyres wro
On Dec 5, 2008, at 10:55 AM, David Skinner wrote:
FWIW, if that one-liner fix works (George and I just chatted about
this
on the phone), we can probably also push it into v1.2.9.
great! thanks.
It occurs to me that this is likely not going to be enough for you,
though. :-\
Like it or
FWIW, if that one-liner fix works (George and I just chatted about
this on the phone), we can probably also push it into v1.2.9.
On Dec 5, 2008, at 10:49 AM, George Bosilca wrote:
Nick,
Thanks for noticing this. It's unbelievable that nobody noticed that
over the last 5 years. Anyway, I t
Nifty -- good to know. Thanks for looking into this!
Do any kernel-hacker types on this list know roundabout what version
thread-affinity was brought into the Linux kernel?
FWIW: all the same concepts here (using pid==0) should also work for
PLPA, so you can set via socket/core, etc.
On
Nick,
Thanks for noticing this. It's unbelievable that nobody noticed that
over the last 5 years. Anyway, I think we have a one line fix for this
problem. I'll test it asap, and then push it in the 1.3.
Thanks,
george.
On Dec 5, 2008, at 10:14 , Nick Wright wrote:
Hi Antony
That w
On Dec 5, 2008, at 10:33 AM, Jens wrote:
thanks a lot. This fixed a bug in my code.
I already like open-mpi for this :)
LOL! Glad to help. :-)
FWIW, we're working on new Fortran bindings for MPI-3 that fix some of
the shortcomings of the F90 bindings.
--
Jeff Squyres
Cisco Systems
ok, so I digged a little deeper, and have some good news. Let me start
with a set of routines, that we didn't even discuss yet, but which works
for setting thread affinity, and discuss then libnuma and
sched_setaffinity() again.
---
On linux systems, the pthread library has a set of ro
Hi Jeff,
thanks a lot. This fixed a bug in my code.
I already like open-mpi for this :)
Greeting
Jens
Jeff Squyres schrieb:
> These functions do exist in Open MPI, but your code is not quite
> correct. Here's a new version that is correct:
>
> -
> program main
> use mpi
> implicit none
> i
Hi Antony
That will work yes, but its not portable to other MPI's that do
implement the profiling layer correctly unfortunately.
I guess we will just need to detect that we are using openmpi when our
tool is configured and add some macros to deal with that accordingly. Is
there an easy way t
These functions do exist in Open MPI, but your code is not quite
correct. Here's a new version that is correct:
-
program main
use mpi
implicit none
integer :: ierr, rank, size
integer :: mpi1_val
integer(kind = MPI_ADDRESS_KIND) :: mpi2_val
logical :: attr_flag
call MPI_INIT(ierr)
call M
Thank you for your response, and these are the details for my problem:
I have installed pwscf and then I have tried to run scf calculations, but
before having the output I got this warning message:
WARNING: There are more than one active ports on host 'stallo-2.local', but the
default subnet
Hi,
I just switched from MPICH2 to openmpi because of sge-support, but I am
missing some mpi-functions for fortran 90.
Does anyone know why
MPI_COMM_GET_ATTR()
MPI_ATTR_GET()
are not available? They work fine with MPICH2.
I compiled openmpi 1.2.8/1.3rc on a clean CentOS 5.2 with GNU-compilers
Hope I didn't misunderstand your question. If you implement
your profiling library in C where you do your real instrumentation,
you don't need to implement the fortran layer, you can simply link
with Fortran to C MPI wrapper library -lmpi_f77. i.e.
/bin/mpif77 -o foo foo.f -L/lib -lmpi_f77 -lYou
34 matches
Mail list logo