Re: [OMPI users] Running on crashing nodes

2010-09-27 Thread Randolph Pullen
I have have successfully used a perl program to start mpirun and record its PIDThe monitor can then watch the output from MPI and terminate the mpirun command with a series of kills or something if it is having trouble. One method of doing this is to prefix all legal output from your MPI program

Re: [OMPI users] Memory affinity

2010-09-27 Thread Tim Prince
On 9/27/2010 2:50 PM, David Singleton wrote: On 09/28/2010 06:52 AM, Tim Prince wrote: On 9/27/2010 12:21 PM, Gabriele Fatigati wrote: HI Tim, I have read that link, but I haven't understood if enabling processor affinity are enabled also memory affinity because is written that: "Note that m

[OMPI users] mpi_in_place not working in mpi_allreduce

2010-09-27 Thread David Zhang
Dear all: I ran this simple fortran code and got unexpected result: ! program reduce implicit none include 'mpif.h' integer :: ierr, rank real*8 :: send(5) call mpi_init(ierr) call mpi_comm_rank(mpi_comm_world,rank,ierr) send = real(rank) print *, rank,':',sen

Re: [OMPI users] Memory affinity

2010-09-27 Thread David Singleton
On 09/28/2010 06:52 AM, Tim Prince wrote: On 9/27/2010 12:21 PM, Gabriele Fatigati wrote: HI Tim, I have read that link, but I haven't understood if enabling processor affinity are enabled also memory affinity because is written that: "Note that memory affinity support is enabled only when pro

Re: [OMPI users] Memory affinity

2010-09-27 Thread Tim Prince
On 9/27/2010 12:21 PM, Gabriele Fatigati wrote: HI Tim, I have read that link, but I haven't understood if enabling processor affinity are enabled also memory affinity because is written that: "Note that memory affinity support is enabled only when processor affinity is enabled" Can i set pro

Re: [OMPI users] Memory affinity

2010-09-27 Thread Gabriele Fatigati
HI Tim, I have read that link, but I haven't understood if enabling processor affinity are enabled also memory affinity because is written that: "Note that memory affinity support is enabled only when processor affinity is enabled" Can i set processory affinity without memory affinity? This is m

Re: [OMPI users] Memory affinity

2010-09-27 Thread Tim Prince
On 9/27/2010 9:01 AM, Gabriele Fatigati wrote: if OpenMPI is numa-compiled, memory affinity is enabled by default? Because I didn't find memory affinity alone ( similar) parameter to set at 1. The FAQ http://www.open-mpi.org/faq/?category=tuning#using-paffinity has a useful introduction

[OMPI users] error on mpiexec

2010-09-27 Thread Kraus Philipp
Hi, I have compiled open-mpi 1.4.2 and uses them with boost-mpi. I can compile and run my first example. If I run it without mpiexec everything works fine. If I do it with mpiexec -np 1 or 2 I would get messages like: [node:05126] [[582,0],0] ORTE_ERROR_LOG: Error in file ess_hnp_module

Re: [OMPI users] Memory affinity

2010-09-27 Thread Gabriele Fatigati
Sorry, memory affinity is enabled by default setting mprocessor_affinity=1 in OpenMPI-numa? 2010/9/27 Gabriele Fatigati > Dear OpenMPI users, > > if OpenMPI is numa-compiled, memory affinity is enabled by default? Because > I didn't find memory affinity alone ( similar) parameter to set at 1.

[OMPI users] Memory affinity

2010-09-27 Thread Gabriele Fatigati
Dear OpenMPI users, if OpenMPI is numa-compiled, memory affinity is enabled by default? Because I didn't find memory affinity alone ( similar) parameter to set at 1. Thanks a lot. -- Ing. Gabriele Fatigati Parallel programmer CINECA Systems & Tecnologies Department Supercomputing Group Vi

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-27 Thread Terry Dontje
Ok there were no 0 value tags in your files. Are you running this with no eager RDMA? If not can you set the following options "-mca btl_openib_use_eager_rdma 0 -mca btl_openib_max_eager_rdma 0 -mca btl_openib_flags 1". thanks, --td Eloi Gaudry wrote: Terry, Please find enclosed the requ

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-27 Thread Eloi Gaudry
Terry, Please find enclosed the requested check outputs (using -output-filename stdout.tag.null option). I'm displaying frag->hdr->tag here. Eloi On Monday 27 September 2010 16:29:12 Terry Dontje wrote: > Eloi, sorry can you print out frag->hdr->tag? > > Unfortunately from your last email I th

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-27 Thread Terry Dontje
Eloi, sorry can you print out frag->hdr->tag? Unfortunately from your last email I think it will still all have non-zero values. If that ends up being the case then there must be something odd with the descriptor pointer to the fragment. --td Eloi Gaudry wrote: Terry, Please find enclosed

Re: [OMPI users] Porting Open MPI to ARM: How essential is the opal_sys_timer_get_cycles() function?

2010-09-27 Thread Jeff Squyres
On Sep 23, 2010, at 1:24 PM, Ken Mighell wrote: > Would a hack written in C suffice? Assembly is always better, but C should be fine. If you really want to, could you write it in C and have the compiler generate optimized assembly for you. -- Jeff Squyres jsquy...@cisco.com For corporate lega

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-27 Thread Eloi Gaudry
Terry, Please find enclosed the requested check outputs (using -output-filename stdout.tag.null option). For information, Nysal In his first message referred to ompi/mca/pml/ob1/pml_ob1_hdr.h and said that hdr->tg value was wrnong on receiving side. #define MCA_PML_OB1_HDR_TYPE_MATCH (MCA_

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-27 Thread Eloi Gaudry
Terry, Please find enclosed the requested check outputs (using -output-filename stdout.tag.null option). For information, Nysal In his first message referred to ompi/mca/pml/ob1/pml_ob1_hdr.h and said that hdr->tg value was wrnong on receiving side. #define MCA_PML_OB1_HDR_TYPE_MATCH (MCA_

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-27 Thread Terry Dontje
I am thinking checking the value of *frag->hdr right before the return in the post_send function in ompi/mca/btl/openib/btl_openib_endpoint.h. It is line 548 in the trunk https://svn.open-mpi.org/source/xref/ompi-trunk/ompi/mca/btl/openib/btl_openib_endpoint.h#548 --td Eloi Gaudry wrote: Hi

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-27 Thread Eloi Gaudry
Hi Terry, Do you have any patch that I could apply to be able to do so ? I'm remotely working on a cluster (with a terminal) and I cannot use any parallel debugger or sequential debugger (with a call to xterm...). I can track frag->hdr->tag value in ompi/mca/btl/openib/btl_openib_component.c::

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-27 Thread Terry Dontje
So it sounds like coalescing is not your issue and that the problem has something to do with the queue sizes. It would be helpful if we could detect the hdr->tag == 0 issue on the sending side and get at least a stack trace. There is something really odd going on here. --td Eloi Gaudry wrot