[OMPI users] OpenMPI deadlocks and race conditions ?

2009-05-13 Thread François PELLEGRINI

Hello all,

I sometimes run into deadlocks in OpenMPI (1.3.3a1r21206), when
running my MPI+threaded PT-Scotch software. Luckily, the case
is very small, with 4 procs only, so I have been able to investigate
it a bit. It seems that matches between commnications are not done
properly on cloned communicators. In the end, I run into a case where
a MPI_Waitall completes a MPI_Barrier on another proc. The bug is
erratic but quite easy to reproduce, luckily too.

To be sure, I ran my code into valgrind using helgrind, its
race condition detection tool. It produced much output, most
of which seems to be innocuous, yet I have some concerns about
such messages as the following ones. The ==12**== were generated
when running on 4 procs, while the ==83**== were generated
when running on 2 procs :

==8329== Possible data race during write of size 4 at 0x8882200
==8329==at 0x508B315: sm_fifo_write (btl_sm.h:254)
==8329==by 0x508B401: mca_btl_sm_send (btl_sm.c:811)
==8329==by 0x5070A0C: mca_bml_base_send_status (bml.h:288)
==8329==by 0x50708E6: mca_pml_ob1_send_request_start_copy
(pml_ob1_sendreq.c:567)
==8329==by 0x5064C30: mca_pml_ob1_send_request_start_btl
(pml_ob1_sendreq.h:363)
==8329==by 0x5064A19: mca_pml_ob1_send_request_start (pml_ob1_sendreq.h:429)
==8329==by 0x5064856: mca_pml_ob1_isend (pml_ob1_isend.c:87)
==8329==by 0x5142C46: ompi_coll_tuned_sendrecv_actual (coll_tuned_util.c:51)
==8329==by 0x514F379: ompi_coll_tuned_barrier_intra_two_procs
(coll_tuned_barrier.c:258)
==8329==by 0x5143252: ompi_coll_tuned_barrier_intra_dec_fixed
(coll_tuned_decision_fixed.c:192)
==8329==by 0x40E410C: PMPI_Barrier (pbarrier.c:59)
==8329==by 0x806C5FB: _SCOTCHdgraphInducePart (dgraph_induce.c:334)
==8329==   Old state: shared-readonly by threads #1, #7
==8329==   New state: shared-modified by threads #1, #7
==8329==   Reason:this thread, #1, holds no consistent locks
==8329==   Location 0x8882200 has never been protected by any lock

==1220== Possible data race during write of size 4 at 0x88CEF88
==1220==at 0x508CD84: sm_fifo_read (btl_sm.h:272)
==1220==by 0x508C864: mca_btl_sm_component_progress (btl_sm_component.c:391)
==1220==by 0x41F72DF: opal_progress (opal_progress.c:207)
==1220==by 0x40BD67D: opal_condition_wait (condition.h:85)
==1220==by 0x40BDA96: ompi_request_default_wait_all (req_wait.c:262)
==1220==by 0x5142C78: ompi_coll_tuned_sendrecv_actual (coll_tuned_util.c:55)
==1220==by 0x514F07A: ompi_coll_tuned_barrier_intra_recursivedoubling
(coll_tuned_barrier.c:174)
==1220==by 0x51432A3: ompi_coll_tuned_barrier_intra_dec_fixed
(coll_tuned_decision_fixed.c:208)
==1220==by 0x40E410C: PMPI_Barrier (pbarrier.c:59)
==1220==by 0x806C5FB: _SCOTCHdgraphInducePart (dgraph_induce.c:334)
==1220==by 0x805E2B2: kdgraphMapRbPartFold2 (kdgraph_map_rb_part.c:199)
==1220==by 0x805EA43: kdgraphMapRbPart2 (kdgraph_map_rb_part.c:331)
==1220==   Old state: shared-readonly by threads #1, #7
==1220==   New state: shared-modified by threads #1, #7
==1220==   Reason:this thread, #1, holds no consistent locks
==1220==   Location 0x88CEF88 has never been protected by any lock

==1219== Possible data race during write of size 4 at 0x891BC8C
==1219==at 0x508CD99: sm_fifo_read (btl_sm.h:273)
==1219==by 0x508C864: mca_btl_sm_component_progress (btl_sm_component.c:391)
==1219==by 0x41F72DF: opal_progress (opal_progress.c:207)
==1219==by 0x40BD67D: opal_condition_wait (condition.h:85)
==1219==by 0x40BDA96: ompi_request_default_wait_all (req_wait.c:262)
==1219==by 0x5142C78: ompi_coll_tuned_sendrecv_actual (coll_tuned_util.c:55)
==1219==by 0x514F07A: ompi_coll_tuned_barrier_intra_recursivedoubling
(coll_tuned_barrier.c:174)
==1219==by 0x51432A3: ompi_coll_tuned_barrier_intra_dec_fixed
(coll_tuned_decision_fixed.c:208)
==1219==by 0x40E410C: PMPI_Barrier (pbarrier.c:59)
==1219==by 0x806C5FB: _SCOTCHdgraphInducePart (dgraph_induce.c:334)
==1219==by 0x805E2B2: kdgraphMapRbPartFold2 (kdgraph_map_rb_part.c:199)
==1219==by 0x805EA43: kdgraphMapRbPart2 (kdgraph_map_rb_part.c:331)
==1219==   Old state: shared-readonly by threads #1, #7
==1219==   New state: shared-modified by threads #1, #7
==1219==   Reason:this thread, #1, holds no consistent locks
==1219==   Location 0x891BC8C has never been protected by any lock

==1220== Possible data race during write of size 4 at 0x4243A68
==1220==at 0x41F72A7: opal_progress (opal_progress.c:186)
==1220==by 0x40BD67D: opal_condition_wait (condition.h:85)
==1220==by 0x40BDA96: ompi_request_default_wait_all (req_wait.c:262)
==1220==by 0x5142C78: ompi_coll_tuned_sendrecv_actual (coll_tuned_util.c:55)
==1220==by 0x514F07A: ompi_coll_tuned_barrier_intra_recursivedoubling
(coll_tuned_barrier.c:174)
==1220==by 0x51432A3: ompi_coll_tuned_barrier_intra_dec_fixed
(coll_tuned_decision_fixed.c:208)
==1220==by 0x40E410C: PMPI_Barrier (pbarrier.c:59)
==1

Re: [OMPI users] ****---How to configure NIS and MPI on spreadNICs?----****

2009-05-13 Thread Jeff Squyres

On May 12, 2009, at 11:44 PM, shan axida wrote:


I want to configure NIS and MPI with different network.
For example, NIS uses eth0 and MPI uses eth1 some thing like that.



I don't have any information about NIS, but you can see these 2 FAQ  
items that discuss how to set which IP networks OMPI will use:


http://www.open-mpi.org/faq/?category=tcp#tcp-selection
http://www.open-mpi.org/faq/?category=tcp#tcp-routability-1.3

--
Jeff Squyres
Cisco Systems



Re: [OMPI users] [Fwd: mpi alltoall memory requirement]

2009-05-13 Thread Ashley Pittman
On Thu, 2009-04-23 at 07:12 +, viral@gmail.com wrote:
> Hi 
> Thanks for your response. 
> However, I am running 
> mpiexec  -ppn 24 -n 192 /opt/IMB-MPI1 alltaoll -msglen /root/temp 
> 
> And file /root/temp contains entry upto 65535 size only. That means
> alltoall test will run upto 65K size only
> 
> So, in that case I will require very less memory but then in that case
> also test is running out-of-memory. Please help someone to understand
> the scenario.
> Or do I need to switch to some algorithm or do I need to set some
> other environment variables ? or anything like that ?

I'm not sure but I seem to remember that IMB uses two application
buffers and alternates which one it uses, this itself will double the
memory requirement.  You should be able to plot performance against max
message size and see where the drop off occurs.

I've always used the compile options to specify max message size and rep
count, the -msglen option is not one I've seen before.

Ashley Pittman.



[OMPI users] How to override MPI functions such as MPI_Init, MPI_Recv...

2009-05-13 Thread Le Duy Khanh
Dear,

 I intend to override some MPI functions such as MPI_Init, MPI_Recv... but I 
don't want to dig into OpenMPI source code.Therefore, I am thinking of a way to 
create a lib called "mympi.h" in which I will #include "mpi.h" to override 
those functions. I will create a new interface with exactly the same signatures 
like MPI_Init (because users are familiar with those functions). However, the 
problem is that I don't know how to override those functions because as I know, 
C/C++ doesn't allow us to override functions (only overload them).

 Could you please show me how to override OMPI functions but still keep the 
same function names and signatures?

 Thank you so much for your time and consideration

Le , Duy Khanh
Cellphone: (+84)958521704
Faculty of Computer Science and Engineering
Ho Chi Minh city University of Technology , Viet Nam



  

Re: [OMPI users] How to override MPI functions such as MPI_Init, MPI_Recv...

2009-05-13 Thread Durga Choudhury
You could use a separate namespace (if you are using C++) and define
your functions there...

Durga

On Wed, May 13, 2009 at 1:20 PM, Le Duy Khanh  wrote:
> Dear,
>
>  I intend to override some MPI functions such as MPI_Init, MPI_Recv... but I
> don't want to dig into OpenMPI source code.Therefore, I am thinking of a way
> to create a lib called "mympi.h" in which I will #include "mpi.h" to
> override those functions. I will create a new interface with exactly the
> same signatures like MPI_Init (because users are familiar with those
> functions). However, the problem is that I don't know how to override those
> functions because as I know, C/C++ doesn't allow us to override functions
> (only overload them).
>
>  Could you please show me how to override OMPI functions but still keep the
> same function names and signatures?
>
>  Thank you so much for your time and consideration
>
> Le , Duy Khanh
> Cellphone: (+84)958521704
> Faculty of Computer Science and Engineering
> Ho Chi Minh city University of Technology , Viet Nam
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



Re: [OMPI users] How to override MPI functions such as MPI_Init, MPI_Recv...

2009-05-13 Thread Jeff Squyres
You could just define your own library with the same signatures as  
official MPI functions, and link that into MPI applications.  Under  
the covers, you invoke the PMPI_* equivalents of each function.  Lots  
of profiling and analysis tools work this way.  For example:


int MPI_Init(int argc, char **argv)
{
/* do whatever you want to do here */
ret = PMPI_Init(argc, argv);
/* do whatever you want to do here */
return ret;
}

compile/link that into libextra_mpi_stuff.a.  Then compile your app  
with:


   mpicc my_mpi_app.c -lextra_mpi_stuff

and then when mpi_mpi_app.c calls MPI_Init(), it'll call *your*  
MPI_Init.  Your MPI_Init will do whatever it wants to, and invoke  
PMPI_Init() (i.e., the "real" init function) and return back to the  
user.


This is the profiling interface of MPI.


On May 13, 2009, at 1:20 PM, Le Duy Khanh wrote:


Dear,

 I intend to override some MPI functions such as MPI_Init,  
MPI_Recv... but I don't want to dig into OpenMPI source  
code.Therefore, I am thinking of a way to create a lib called  
"mympi.h" in which I will #include "mpi.h" to override those  
functions. I will create a new interface with exactly the same  
signatures like MPI_Init (because users are familiar with those  
functions). However, the problem is that I don't know how to  
override those functions because as I know, C/C++ doesn't allow us  
to override functions (only overload them).


 Could you please show me how to override OMPI functions but still  
keep the same function names and signatures?


 Thank you so much for your time and consideration

Le , Duy Khanh
Cellphone: (+84)958521704
Faculty of Computer Science and Engineering
Ho Chi Minh city University of Technology , Viet Nam

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] How to override MPI functions such as MPI_Init, MPI_Recv...

2009-05-13 Thread Le Duy Khanh
Wow, that's great.

 You mean that PMPI_* is totally/functionally similar to MPI_*, right ?

 Thank you so much for your instructions.

Le , Duy Khanh
Cellphone: (+84)958521704
Faculty of Computer Science and Engineering
Ho Chi Minh city University of Technology , Viet Nam



--- On Wed, 5/13/09, Jeff Squyres  wrote:

From: Jeff Squyres 
Subject: Re: [OMPI users] How to override MPI functions such as MPI_Init, 
MPI_Recv...
To: "Open MPI Users" 
List-Post: users@lists.open-mpi.org
Date: Wednesday, May 13, 2009, 11:43 AM

You could just define your own library with the same signatures as official MPI 
functions, and link that into MPI applications.  Under the covers, you invoke 
the PMPI_* equivalents of each function.  Lots of profiling and analysis tools 
work this way.  For example:

int MPI_Init(int argc, char **argv)
{
    /* do whatever you want to do here */
    ret = PMPI_Init(argc, argv);
    /* do whatever you want to do here */
    return ret;
}

compile/link that into libextra_mpi_stuff.a.  Then compile your app with:

   mpicc my_mpi_app.c -lextra_mpi_stuff

and then when mpi_mpi_app.c calls MPI_Init(), it'll call *your* MPI_Init.  Your 
MPI_Init will do whatever it wants to, and invoke PMPI_Init() (i.e., the "real" 
init function) and return back to the user.

This is the profiling interface of MPI.


On May 13, 2009, at 1:20 PM, Le Duy Khanh wrote:

> Dear,
> 
>  I intend to override some MPI functions such as MPI_Init, MPI_Recv... but I 
>don't want to dig into OpenMPI source code.Therefore, I am thinking of a way 
>to create a lib called "mympi.h" in which I will #include "mpi.h" to override 
>those functions. I will create a new interface with exactly the same 
>signatures like MPI_Init (because users are familiar with those functions). 
>However, the problem is that I don't know how to override those functions 
>because as I know, C/C++ doesn't allow us to override functions (only overload 
>them).
> 
>  Could you please show me how to override OMPI functions but still keep the 
>same function names and signatures?
> 
>  Thank you so much for your time and consideration
> 
> Le , Duy Khanh
> Cellphone: (+84)958521704
> Faculty of Computer Science and Engineering
> Ho Chi Minh city University of Technology , Viet Nam
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



  

Re: [OMPI users] How to override MPI functions such as MPI_Init, MPI_Recv...

2009-05-13 Thread Jeff Squyres

On May 13, 2009, at 2:33 PM, Le Duy Khanh wrote:


Wow, that's great.

 You mean that PMPI_* is totally/functionally similar to MPI_*,  
right ?


They are actually aliases of each other in Open MPI.  See the  
profiling chapter in the MPI spec; it's intended that you can  
intercept the MPI_* calls and then call the "real" functions by  
invoking their PMPI_* counterparts to effect the real functionality.


--
Jeff Squyres
Cisco Systems



[OMPI users] Problems with "error polling LP CQ with status RNR"

2009-05-13 Thread Åke Sandgren
Hi!

I'm having problem with getting the "error polling LP CQ with status
RNR..." on an otherwise completely empty system.
There are no errors visible in the error counters in any of the HCAs or
switches or anywhere else.

I'm running OMPI 1.3.2 built with pathscale 3.2

If i add -mca btl 'ofud,self,sm' the same code works ok.

It usually only shows up on runs with nodes=16:ppn=8 or higher, i.e. 8x8
works ok.

This might very well be a pathscale problem since when running with the
debug version of ompi 1.3.2 the problem goes away.

Complete error is:
error polling LP CQ with status RECEIVER NOT READY RETRY EXCEEDED ERROR
status number 13 for wr_id 465284992 opcode -1  vendor error 135 qp_idx
0

Any ideas to where in the ompi code i should start reducing optimization
levels to pinpoint this?

I'll try some more tests tomorrow with a hopefully fresh mind...

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90 7866126
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se