Re: [OMPI users] Issue with Profiling Fortran code

2008-12-05 Thread Nick Wright
I think this issue is now resolved and thanks everybody for your help. I certainly learnt a lot! For the first case you describe, as OPENMPI is now, the call sequence from fortran is mpi_comm_rank -> MPI_Comm_rank -> PMPI_Comm_rank For the second case, as MPICH is now, its mpi_comm_rank ->

Re: [OMPI users] Issue with Profiling Fortran code

2008-12-05 Thread Anthony Chan
Hi George, - "George Bosilca" wrote: > On Dec 5, 2008, at 03:16 , Anthony Chan wrote: > > > void mpi_comm_rank_(MPI_Comm *comm, int *rank, int *info) { > >printf("mpi_comm_rank call successfully intercepted\n"); > >*info = PMPI_Comm_rank(comm,rank); > > } > > Unfortunately this exa

Re: [OMPI users] Issue with Profiling Fortran code

2008-12-05 Thread Anthony Chan
Hi Nick, - "Nick Wright" wrote: > For the first case you describe, as OPENMPI is now, the call sequence > > from fortran is > > mpi_comm_rank -> MPI_Comm_rank -> PMPI_Comm_rank > > For the second case, as MPICH is now, its > > mpi_comm_rank -> PMPI_Comm_rank > AFAIK, all known/popular

Re: [OMPI users] Issue with Profiling Fortran code

2008-12-05 Thread Anthony Chan
Hi Nick, - "Nick Wright" wrote: > Hi Antony > > That will work yes, but its not portable to other MPI's that do > implement the profiling layer correctly unfortunately. I guess I must have missed something here. What is not portable ? > > I guess we will just need to detect that we are

Re: [OMPI users] Issue with Profiling Fortran code

2008-12-05 Thread George Bosilca
After spending few hours pondering about this problem, we came to the conclusion that the best approach is to keep what we had before (i.e. the original approach). This means I'll undo my patch in the trunk, and not change the behavior on the next releases (1.3 and 1.2.9). This approach, wh

Re: [OMPI users] Deadlock on large numbers of processors

2008-12-05 Thread Justin
The reason i'd like to disable these eager buffers is to help detect the deadlock better. I would not run with this for a normal run but it would be useful for debugging. If the deadlock is indeed due to our code then disabling any shared buffers or eager sends would make that deadlock reprod

Re: [OMPI users] Deadlock on large numbers of processors

2008-12-05 Thread Brock Palen
OpenMPI has differnt eager limits for all the network types, on your system run: ompi_info --param btl all and look for the eager_limits You can set these values to 0 using the syntax I showed you before. That would disable eager messages. There might be a better way to disable eager messag

Re: [OMPI users] Issue with Profiling Fortran code

2008-12-05 Thread George Bosilca
On Dec 5, 2008, at 03:16 , Anthony Chan wrote: void mpi_comm_rank_(MPI_Comm *comm, int *rank, int *info) { printf("mpi_comm_rank call successfully intercepted\n"); *info = PMPI_Comm_rank(comm,rank); } Unfortunately this example is not correct. The real Fortran prototype for the MPI_Com

Re: [OMPI users] Deadlock on large numbers of processors

2008-12-05 Thread Justin
Thank you for this info. I should add that our code tends to post a lot of sends prior to the other side posting receives. This causes a lot of unexpected messages to exist. Our code explicitly matches up all tags and processors (that is we do not use MPI wild cards). If we had a dead lock

Re: [OMPI users] Deadlock on large numbers of processors

2008-12-05 Thread Brock Palen
When ever this happens we found the code to have a deadlock. users never saw it until they cross the eager->roundevous threshold. Yes you can disable shared memory with: mpirun --mca btl ^sm Or you can try increasing the eager limit. ompi_info --param btl sm MCA btl: parameter "btl_sm_eage

Re: [OMPI users] Deadlock on large numbers of processors

2008-12-05 Thread Scott Atchley
On Dec 5, 2008, at 12:22 PM, Justin wrote: Does OpenMPI have any known deadlocks that might be causing our deadlocks? Known deadlocks, no. We are assisting a customer, however, with a deadlock that occurs in IMB Alltoall (and some other IMB tests) when using 128 hosts and the MX BTL. We h

Re: [OMPI users] Issue with Profiling Fortran code

2008-12-05 Thread Nick Wright
Brian Sorry I picked the wrong word there. I guess this is more complicated than I thought it was. For the first case you describe, as OPENMPI is now, the call sequence from fortran is mpi_comm_rank -> MPI_Comm_rank -> PMPI_Comm_rank For the second case, as MPICH is now, its mpi_comm_rank

Re: [OMPI users] Processor/core selection/affinity for large shared memory systems

2008-12-05 Thread V. Ram
Terry Frankcombe wrote: > Isn't it up to the OS scheduler what gets run where? I was under the impression that the processor affinity API was designed to let the OS (at least Linux) know how a given task preferred to be bound in terms of the system topology. -- V. Ram v_r_...@fastmail.fm --

Re: [OMPI users] Issue with Profiling Fortran code

2008-12-05 Thread Brian W. Barrett
Nick - I think you have an incorrect deffinition of "correctly" :). According to the MPI standard, an MPI implementation is free to either layer language bindings (and only allow profiling at the lowest layer) or not layer the language bindings (and require profiling libraries intercept each

Re: [OMPI users] Processor/core selection/affinity for large shared memory systems

2008-12-05 Thread V. Ram
Ralph Castain wrote: > Thanks - yes, that helps. Can you do add --display-map to you cmd > line? That will tell us what mpirun thinks it is doing. The output from display map is below. Note that I've sanitized a few items, but nothing relevant to this: [granite:29685] Map for job: 1 Gener

Re: [OMPI users] Issue with Profiling Fortran code

2008-12-05 Thread Nick Wright
I hope you are aware, that *many* tools and application actually profile the fortran MPI layer by intercepting the C function calls. This allows them to not have to deal with f2c translation of MPI objects and not worry about the name mangling issue. Would there be a way to have both options e

Re: [OMPI users] Issue with Profiling Fortran code

2008-12-05 Thread Edgar Gabriel
actually I am wondering whether my previous statement was correct. If you do not intercept the fortran MPI call, than it still goes to the C MPI call, which you can intercept. Only if you intercept the fortran MPI call we do not call the C MPI but the C PMPI call, correct? So in theory, it coul

Re: [OMPI users] Issue with Profiling Fortran code

2008-12-05 Thread Jeff Squyres
On Dec 5, 2008, at 12:22 PM, Edgar Gabriel wrote: I hope you are aware, that *many* tools and application actually profile the fortran MPI layer by intercepting the C function calls. This allows them to not have to deal with f2c translation of MPI objects and not worry about the name mangli

Re: [OMPI users] Issue with Profiling Fortran code

2008-12-05 Thread Jeff Squyres
On Dec 5, 2008, at 11:29 AM, Nick Wright wrote: I think we can just look at OPEN_MPI as you say and then OMPI_MAJOR_VERSION, OMPI_MINOR_VERSION & OMPI_RELEASE_VERSION from mpi.h and if version is less than 1.2.9 implement a work around as Antony suggested. Its not the most elegant solution b

[OMPI users] Deadlock on large numbers of processors

2008-12-05 Thread Justin
Hi, We are currently using OpenMPI 1.3 on Ranger for large processor jobs (8K+). Our code appears to be occasionally deadlocking at random within point to point communication (see stacktrace below). This code has been tested on many different MPI versions and as far as we know it does not c

Re: [OMPI users] Issue with Profiling Fortran code

2008-12-05 Thread Edgar Gabriel
George, I hope you are aware, that *many* tools and application actually profile the fortran MPI layer by intercepting the C function calls. This allows them to not have to deal with f2c translation of MPI objects and not worry about the name mangling issue. Would there be a way to have both

Re: [OMPI users] Issue with Profiling Fortran code

2008-12-05 Thread Nick Wright
I think we can just look at OPEN_MPI as you say and then OMPI_MAJOR_VERSION, OMPI_MINOR_VERSION & OMPI_RELEASE_VERSION from mpi.h and if version is less than 1.2.9 implement a work around as Antony suggested. Its not the most elegant solution but it will work I think? Nick. Jeff Squyres wro

Re: [OMPI users] Issue with Profiling Fortran code

2008-12-05 Thread Jeff Squyres
On Dec 5, 2008, at 10:55 AM, David Skinner wrote: FWIW, if that one-liner fix works (George and I just chatted about this on the phone), we can probably also push it into v1.2.9. great! thanks. It occurs to me that this is likely not going to be enough for you, though. :-\ Like it or

Re: [OMPI users] Issue with Profiling Fortran code

2008-12-05 Thread Jeff Squyres
FWIW, if that one-liner fix works (George and I just chatted about this on the phone), we can probably also push it into v1.2.9. On Dec 5, 2008, at 10:49 AM, George Bosilca wrote: Nick, Thanks for noticing this. It's unbelievable that nobody noticed that over the last 5 years. Anyway, I t

Re: [OMPI users] Hybrid program

2008-12-05 Thread Jeff Squyres
Nifty -- good to know. Thanks for looking into this! Do any kernel-hacker types on this list know roundabout what version thread-affinity was brought into the Linux kernel? FWIW: all the same concepts here (using pid==0) should also work for PLPA, so you can set via socket/core, etc. On

Re: [OMPI users] Issue with Profiling Fortran code

2008-12-05 Thread George Bosilca
Nick, Thanks for noticing this. It's unbelievable that nobody noticed that over the last 5 years. Anyway, I think we have a one line fix for this problem. I'll test it asap, and then push it in the 1.3. Thanks, george. On Dec 5, 2008, at 10:14 , Nick Wright wrote: Hi Antony That w

Re: [OMPI users] Fortran90 functions missing: MPI_COMM_GET_ATTR / MPI_ATTR_GET()

2008-12-05 Thread Jeff Squyres
On Dec 5, 2008, at 10:33 AM, Jens wrote: thanks a lot. This fixed a bug in my code. I already like open-mpi for this :) LOL! Glad to help. :-) FWIW, we're working on new Fortran bindings for MPI-3 that fix some of the shortcomings of the F90 bindings. -- Jeff Squyres Cisco Systems

Re: [OMPI users] Hybrid program

2008-12-05 Thread Edgar Gabriel
ok, so I digged a little deeper, and have some good news. Let me start with a set of routines, that we didn't even discuss yet, but which works for setting thread affinity, and discuss then libnuma and sched_setaffinity() again. --- On linux systems, the pthread library has a set of ro

Re: [OMPI users] Fortran90 functions missing: MPI_COMM_GET_ATTR / MPI_ATTR_GET()

2008-12-05 Thread Jens
Hi Jeff, thanks a lot. This fixed a bug in my code. I already like open-mpi for this :) Greeting Jens Jeff Squyres schrieb: > These functions do exist in Open MPI, but your code is not quite > correct. Here's a new version that is correct: > > - > program main > use mpi > implicit none > i

Re: [OMPI users] Issue with Profiling Fortran code

2008-12-05 Thread Nick Wright
Hi Antony That will work yes, but its not portable to other MPI's that do implement the profiling layer correctly unfortunately. I guess we will just need to detect that we are using openmpi when our tool is configured and add some macros to deal with that accordingly. Is there an easy way t

Re: [OMPI users] Fortran90 functions missing: MPI_COMM_GET_ATTR / MPI_ATTR_GET()

2008-12-05 Thread Jeff Squyres
These functions do exist in Open MPI, but your code is not quite correct. Here's a new version that is correct: - program main use mpi implicit none integer :: ierr, rank, size integer :: mpi1_val integer(kind = MPI_ADDRESS_KIND) :: mpi2_val logical :: attr_flag call MPI_INIT(ierr) call M

[OMPI users] MCA parameter

2008-12-05 Thread Yasmine Yacoub
Thank you for your response, and these are the details for my problem: I have installed pwscf and then I have tried to run scf calculations, but before having the output I got this warning message:     WARNING: There are more than one active ports on host 'stallo-2.local', but the default subnet

[OMPI users] Fortran90 functions missing: MPI_COMM_GET_ATTR / MPI_ATTR_GET()

2008-12-05 Thread Jens
Hi, I just switched from MPICH2 to openmpi because of sge-support, but I am missing some mpi-functions for fortran 90. Does anyone know why MPI_COMM_GET_ATTR() MPI_ATTR_GET() are not available? They work fine with MPICH2. I compiled openmpi 1.2.8/1.3rc on a clean CentOS 5.2 with GNU-compilers

Re: [OMPI users] Issue with Profiling Fortran code

2008-12-05 Thread Anthony Chan
Hope I didn't misunderstand your question. If you implement your profiling library in C where you do your real instrumentation, you don't need to implement the fortran layer, you can simply link with Fortran to C MPI wrapper library -lmpi_f77. i.e. /bin/mpif77 -o foo foo.f -L/lib -lmpi_f77 -lYou