Re: [OMPI users] deadlock in openmpi 1.5rc5

2010-08-09 Thread John Hsu
I've replied in the ticket. https://svn.open-mpi.org/trac/ompi/ticket/2530#comment:2 thanks! John On Mon, Aug 9, 2010 at 2:42 PM, Jeff Squyres wrote: > I've opened a ticket about this -- if it's an actual problem, it's a 1.5 > blocker: > >https://svn.open-mpi.org/trac/ompi/ticket/2530 > > Wh

Re: [OMPI users] deadlock in openmpi 1.5rc5

2010-08-09 Thread Jeff Squyres
I've opened a ticket about this -- if it's an actual problem, it's a 1.5 blocker: https://svn.open-mpi.org/trac/ompi/ticket/2530 What version of knem and Linux are you using? On Aug 9, 2010, at 4:50 PM, John Hsu wrote: > problem "fixed" by adding the --mca btl_sm_use_knem 0 option (with

Re: [OMPI users] deadlock in openmpi 1.5rc5

2010-08-09 Thread John Hsu
problem "fixed" by adding the --mca btl_sm_use_knem 0 option (with -npernode 11), so I proceeded to bump up -npernode to 12: $ ../openmpi_devel/bin/mpirun -hostfile hostfiles/hostfile.wgsgX -npernode 12 --mca btl_sm_use_knem 0 ./bin/mpi_test and the same error occurs, (gdb) bt #0 0x7fcca6a

Re: [OMPI users] deadlock in openmpi 1.5rc5

2010-08-09 Thread Jeff Squyres
In your first mail, you mentioned that you are testing the new knem support. Can you try disabling knem and see if that fixes the problem? (i.e., run with --mca btl_sm_use_knem 0") If it fixes the issue, that might mean we have a knem-based bug. On Aug 6, 2010, at 1:42 PM, John Hsu wrote:

Re: [OMPI users] deadlock in openmpi 1.5rc5

2010-08-06 Thread John Hsu
Hi, sorry for the confusion, that was indeed the trunk version of things I was running. Here's the same problem using http://www.open-mpi.org/software/ompi/v1.5/downloads/openmpi-1.5rc5.tar.bz2 command-line: ../openmpi_devel/bin/mpirun -hostfile hostfiles/hostfile.wgsgX -npernode 11 ./bin/mpi_

Re: [OMPI users] deadlock in openmpi 1.5rc5

2010-08-06 Thread Ralph Castain
You clearly have an issue with version confusion. The file cited in your warning: > [wgsg0:29074] Warning -- mutex was double locked from errmgr_hnp.c:772 does not exist in 1.5rc5. It only exists in the developer's trunk at this time. Check to ensure you have the right paths set, blow away the

[OMPI users] deadlock in openmpi 1.5rc5

2010-08-05 Thread John Hsu
Hi All, I am new to openmpi and have encountered an issue using pre-release 1.5rc5, for a simple mpi code (see attached). In this test, nodes 1 to n sends out a random number to node 0, node 0 sums all numbers received. This code works fine on 1 machine with any number of nodes, and on 3 machines