[OMPI users] OpenIB Error in ibv_create_srq

2010-07-30 Thread Allen Barnett
Hi: A customer is attempting to run our OpenMPI 1.4.2-based application
on a cluster of machines running RHEL4 with the standard OFED stack. The
HCAs are identified as:

03:01.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge (rev a1)
04:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1)

ibv_devinfo says that one port on the HCAs is active but the other is
down:

hca_id: mthca0
fw_ver: 3.0.2
node_guid:  0006:6a00:9800:4c78
sys_image_guid: 0006:6a00:9800:4c78
vendor_id:  0x066a
vendor_part_id: 23108
hw_ver: 0xA1
phys_port_cnt:  2
port:   1
state:  active (4)
max_mtu:2048 (4)
active_mtu: 2048 (4)
sm_lid: 1
port_lid:   26
port_lmc:   0x00

port:   2
state:  down (1)
max_mtu:2048 (4)
active_mtu: 512 (2)
sm_lid: 0
port_lid:   0
port_lmc:   0x00


 When the OMPI application is run, it prints the error message:


The OpenFabrics (openib) BTL failed to initialize while trying to
create an internal queue.  This typically indicates a failed
OpenFabrics installation, faulty hardware, or that Open MPI is
attempting to use a feature that is not supported on your hardware
(i.e., is a shared receive queue specified in the
btl_openib_receive_queues MCA parameter with a device that does not
support it?).  The failure occured here:

  Local host:  machine001.lan
  OMPI
source: /software/openmpi-1.4.2/ompi/mca/btl/openib/btl_openib.c:250
  Function:ibv_create_srq()
  Error:   Invalid argument (errno=22)
  Device:  mthca0

You may need to consult with your system administrator to get this
problem fixed.


The full log of a run with "btl_openib_verbose 1" is attached. My
application appears to run to completion, but I can't tell if it's just
running on TCP and not using the IB hardware.

I would appreciate any suggestions on how to proceed to fix this error.

Thanks,
Allen

-- 
Allen Barnett
Transpire, Inc
E-Mail: al...@transpireinc.com


openib.listing.gz
Description: GNU Zip compressed data


Re: [OMPI users] OpenIB Error in ibv_create_srq

2010-08-02 Thread Allen Barnett
Hi Terry:
It is indeed the case that the openib BTL has not been initialized. I
ran with your tcp-disabled MCA option and it aborted in MPI_Init.

The OFED stack is what's included in RHEL4. It appears to be made up of
the RPMs:
openib-1.4-1.el4
opensm-3.2.5-1.el4
libibverbs-1.1.2-1.el4

How can I determine if srq is supported? Is there an MCA option to
defeat it? (Our in-house cluster has more recent Mellanox IB hardware
and is running this same IB stack and ompi 1.4.2 works OK, so I suspect
srq is supported by the OpenFabrics stack. Perhaps.)

Thanks,
Allen

On Mon, 2010-08-02 at 06:47 -0400, Terry Dontje wrote:
> My guess is from the message below saying "(openib) BTL failed to
> initialize"  that the code is probably running over tcp.  To
> absolutely prove this you can specify to only use the openib, sm and
> self btls to eliminate the tcp btl.  To do that you add the following
> to the mpirun line "-mca btl openib,sm,self".  I believe with that
> specification the code will abort and not run to completion.  
> 
> What version of the OFED stack are you using?  I wonder if srq is
> supported on your system or not?
> 
> --td
> 
> Allen Barnett wrote: 
> > Hi: A customer is attempting to run our OpenMPI 1.4.2-based application
> > on a cluster of machines running RHEL4 with the standard OFED stack. The
> > HCAs are identified as:
> > 
> > 03:01.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge (rev a1)
> > 04:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1)
> > 
> > ibv_devinfo says that one port on the HCAs is active but the other is
> > down:
> > 
> > hca_id: mthca0
> > fw_ver: 3.0.2
> > node_guid:  0006:6a00:9800:4c78
> > sys_image_guid: 0006:6a00:9800:4c78
> > vendor_id:  0x066a
> > vendor_part_id: 23108
> > hw_ver: 0xA1
> > phys_port_cnt:  2
> > port:   1
> > state:  active (4)
> > max_mtu:2048 (4)
> > active_mtu: 2048 (4)
> > sm_lid: 1
> > port_lid:   26
> > port_lmc:   0x00
> > 
> > port:   2
> > state:  down (1)
> > max_mtu:2048 (4)
> > active_mtu: 512 (2)
> > sm_lid: 0
> > port_lid:   0
> > port_lmc:   0x00
> > 
> > 
> >  When the OMPI application is run, it prints the error message:
> > 
> > 
> > The OpenFabrics (openib) BTL failed to initialize while trying to
> > create an internal queue.  This typically indicates a failed
> > OpenFabrics installation, faulty hardware, or that Open MPI is
> > attempting to use a feature that is not supported on your hardware
> > (i.e., is a shared receive queue specified in the
> > btl_openib_receive_queues MCA parameter with a device that does not
> > support it?).  The failure occured here:
> > 
> >   Local host:  machine001.lan
> >   OMPI
> > source: /software/openmpi-1.4.2/ompi/mca/btl/openib/btl_openib.c:250
> >   Function:ibv_create_srq()
> >   Error:   Invalid argument (errno=22)
> >   Device:  mthca0
> > 
> > You may need to consult with your system administrator to get this
> > problem fixed.
> > 
> > 
> > The full log of a run with "btl_openib_verbose 1" is attached. My
> > application appears to run to completion, but I can't tell if it's just
> > running on TCP and not using the IB hardware.
> > 
> > I would appreciate any suggestions on how to proceed to fix this error.
> > 
> > Thanks,
> > Allen
> 

-- 
Allen Barnett
Transpire, Inc
E-Mail: al...@transpireinc.com



Re: [OMPI users] OpenIB Error in ibv_create_srq

2010-08-03 Thread Allen Barnett
Hi: In response to my own question, by studying the file
mca-btl-openib-device-params.ini, I discovered that this option in
OMPI-1.4.2:

-mca btl_openib_receive_queues P,65536,256,192,128

was sufficient to prevent OMPI from trying to create shared receive
queues and allowed my application to run to completion using the IB
hardware.

I guess my question now is: What do these numbers mean? Presumably the
size (or counts?) of buffers to allocate? Are there limits or a way to
tune these values?

Thanks,
Allen

On Mon, 2010-08-02 at 12:49 -0400, Allen Barnett wrote:
> Hi Terry:
> It is indeed the case that the openib BTL has not been initialized. I
> ran with your tcp-disabled MCA option and it aborted in MPI_Init.
> 
> The OFED stack is what's included in RHEL4. It appears to be made up of
> the RPMs:
> openib-1.4-1.el4
> opensm-3.2.5-1.el4
> libibverbs-1.1.2-1.el4
> 
> How can I determine if srq is supported? Is there an MCA option to
> defeat it? (Our in-house cluster has more recent Mellanox IB hardware
> and is running this same IB stack and ompi 1.4.2 works OK, so I suspect
> srq is supported by the OpenFabrics stack. Perhaps.)
> 
> Thanks,
> Allen
> 
> On Mon, 2010-08-02 at 06:47 -0400, Terry Dontje wrote:
> > My guess is from the message below saying "(openib) BTL failed to
> > initialize"  that the code is probably running over tcp.  To
> > absolutely prove this you can specify to only use the openib, sm and
> > self btls to eliminate the tcp btl.  To do that you add the following
> > to the mpirun line "-mca btl openib,sm,self".  I believe with that
> > specification the code will abort and not run to completion.  
> > 
> > What version of the OFED stack are you using?  I wonder if srq is
> > supported on your system or not?
> > 
> > --td
> > 
> > Allen Barnett wrote: 
> > > Hi: A customer is attempting to run our OpenMPI 1.4.2-based application
> > > on a cluster of machines running RHEL4 with the standard OFED stack. The
> > > HCAs are identified as:
> > > 
> > > 03:01.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge (rev a1)
> > > 04:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1)
> > > 
> > > ibv_devinfo says that one port on the HCAs is active but the other is
> > > down:
> > > 
> > > hca_id:   mthca0
> > >   fw_ver: 3.0.2
> > >   node_guid:  0006:6a00:9800:4c78
> > >   sys_image_guid: 0006:6a00:9800:4c78
> > >   vendor_id:  0x066a
> > >   vendor_part_id: 23108
> > >   hw_ver: 0xA1
> > >   phys_port_cnt:  2
> > >   port:   1
> > >   state:  active (4)
> > >   max_mtu:2048 (4)
> > >   active_mtu: 2048 (4)
> > >   sm_lid: 1
> > >   port_lid:   26
> > >   port_lmc:   0x00
> > > 
> > >   port:   2
> > >   state:  down (1)
> > >   max_mtu:2048 (4)
> > >   active_mtu: 512 (2)
> > >   sm_lid: 0
> > >   port_lid:   0
> > >   port_lmc:   0x00
> > > 
> > > 
> > >  When the OMPI application is run, it prints the error message:
> > > 
> > > 
> > > The OpenFabrics (openib) BTL failed to initialize while trying to
> > > create an internal queue.  This typically indicates a failed
> > > OpenFabrics installation, faulty hardware, or that Open MPI is
> > > attempting to use a feature that is not supported on your hardware
> > > (i.e., is a shared receive queue specified in the
> > > btl_openib_receive_queues MCA parameter with a device that does not
> > > support it?).  The failure occured here:
> > > 
> > >   Local host:  machine001.lan
> > >   OMPI
> > > source: /software/openmpi-1.4.2/ompi/mca/btl/openib/btl_openib.c:250
> > >   Function:ibv_create_srq()
> > >   Error:   Invalid argument (errno=22)
> > >   Device:  mthca0
> > > 
> > > You may need to consult with your system administrator to get this
> > > problem fixed.
> > > 
> > > 
> > > The full log of a run with "btl_openib_verbose 1" is attached. My
> > > application appears to run to completion, but I can't tell if it's just
> > > running on TCP and not using the IB hardware.
> > > 
> > > I would appreciate any suggestions on how to proceed to fix this error.
> > > 
> > > Thanks,
> > > Allen
> > 
> 

-- 
Allen Barnett
Transpire, Inc
E-Mail: al...@transpireinc.com
Skype:  allenbarnett
Ph: 518-887-2930



Re: [OMPI users] OpenIB Error in ibv_create_srq

2010-08-04 Thread Allen Barnett
Thanks for the pointer!

Do you know if these sizes are dependent on the hardware?

Thanks,
Allen

On Tue, 2010-08-03 at 10:29 -0400, Terry Dontje wrote:
> Sorry, I didn't see your prior question glad you found the
> btl_openib_receive_queues parameter.  There is not a faq entry for
> this but I found the following in the openib btl help file that spells
> out the parameters when using Per-peer receive queue (ie receive queue
> setting with "P" as the first argument).
> 
> Per-peer receive queues require between 2 and 5 parameters:
> 
>  1. Buffer size in bytes (mandatory)
>  2. Number of buffers (mandatory)
>  3. Low buffer count watermark (optional; defaults to (num_buffers /
> 2))
>  4. Credit window size (optional; defaults to (low_watermark / 2))
>  5. Number of buffers reserved for credit messages (optional;
>  defaults to (num_buffers*2-1)/credit_window)
> 
>  Example: P,128,256,128,16
>   - 128 byte buffers
>   - 256 buffers to receive incoming MPI messages
>   - When the number of available buffers reaches 128, re-post 128 more
> buffers to reach a total of 256
>   - If the number of available credits reaches 16, send an explicit
> credit message to the sender
>   - Defaulting to ((256 * 2) - 1) / 16 = 31; this many buffers are
> reserved for explicit credit messages
> 
> --td
> Allen Barnett wrote: 
> > Hi: In response to my own question, by studying the file
> > mca-btl-openib-device-params.ini, I discovered that this option in
> > OMPI-1.4.2:
> > 
> > -mca btl_openib_receive_queues P,65536,256,192,128
> > 
> > was sufficient to prevent OMPI from trying to create shared receive
> > queues and allowed my application to run to completion using the IB
> > hardware.
> > 
> > I guess my question now is: What do these numbers mean? Presumably the
> > size (or counts?) of buffers to allocate? Are there limits or a way to
> > tune these values?
> > 
> > Thanks,
> > Allen
> > 
> > On Mon, 2010-08-02 at 12:49 -0400, Allen Barnett wrote:
> >   
> > > Hi Terry:
> > > It is indeed the case that the openib BTL has not been initialized. I
> > > ran with your tcp-disabled MCA option and it aborted in MPI_Init.
> > > 
> > > The OFED stack is what's included in RHEL4. It appears to be made up of
> > > the RPMs:
> > > openib-1.4-1.el4
> > > opensm-3.2.5-1.el4
> > > libibverbs-1.1.2-1.el4
> > > 
> > > How can I determine if srq is supported? Is there an MCA option to
> > > defeat it? (Our in-house cluster has more recent Mellanox IB hardware
> > > and is running this same IB stack and ompi 1.4.2 works OK, so I suspect
> > > srq is supported by the OpenFabrics stack. Perhaps.)
> > > 
> > > Thanks,
> > > Allen
> > > 
> > > On Mon, 2010-08-02 at 06:47 -0400, Terry Dontje wrote:
> > > 
> > > > My guess is from the message below saying "(openib) BTL failed to
> > > > initialize"  that the code is probably running over tcp.  To
> > > > absolutely prove this you can specify to only use the openib, sm and
> > > > self btls to eliminate the tcp btl.  To do that you add the following
> > > > to the mpirun line "-mca btl openib,sm,self".  I believe with that
> > > > specification the code will abort and not run to completion.  
> > > > 
> > > > What version of the OFED stack are you using?  I wonder if srq is
> > > > supported on your system or not?
> > > > 
> > > > --td
> > > > 
> > > > Allen Barnett wrote: 
> > > >   
> > > > > Hi: A customer is attempting to run our OpenMPI 1.4.2-based 
> > > > > application
> > > > > on a cluster of machines running RHEL4 with the standard OFED stack. 
> > > > > The
> > > > > HCAs are identified as:
> > > > > 
> > > > > 03:01.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge (rev a1)
> > > > > 04:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1)
> > > > > 
> > > > > ibv_devinfo says that one port on the HCAs is active but the other is
> > > > > down:
> > > > > 
> > > > > hca_id:   mthca0
> > > > >   fw_ver: 3.0.2
> > > > >   node_guid:  0006:6a00:9800:4c78
> > > > >   sys_image_guid: 0006:6a00:9800:4c78
> > > > >   v

[OMPI users] Thanks

2007-06-14 Thread Allen Barnett
I just wanted to say "Thank You!" to the OpenMPI developers for the
OPAL_PREFIX option :-) This has proved very helpful in getting my
customers up and running with the least amount of effort on their part.
I really appreciate it.
Thanks,
Allen
-- 
Allen Barnett
Transpire, Inc.
e-mail: al...@transpireinc.com
Ph: 518-887-2930




[OMPI users] Possible Memcpy bug in MPI_Comm_split

2007-08-16 Thread Allen Barnett
Hi:
I was running my OpenMPI 1.2.3 application under Valgrind and I observed
this error message:

==14322== Source and destination overlap in memcpy(0x41F5BD0, 0x41F5BD8,
16)
==14322==at 0x49070AD: memcpy (mc_replace_strmem.c:116)
==14322==by 0x4A45CF4: ompi_ddt_copy_content_same_ddt
(in /home/scratch/DMP/RHEL4-GCC4/lib/libmpi.so.0.0.0)
==14322==by 0x7A6C386: ompi_coll_tuned_allgather_intra_bruck
(in /home/scratch/DMP/RHEL4-GCC4/lib/openmpi/mca_coll_tuned.so)
==14322==by 0x4A29FFE: ompi_comm_split
(in /home/scratch/DMP/RHEL4-GCC4/lib/libmpi.so.0.0.0)
==14322==by 0x4A4E322: MPI_Comm_split
(in /home/scratch/DMP/RHEL4-GCC4/lib/libmpi.so.0.0.0)
==14322==by 0x400A26: main
(in /home/scratch/DMP/severian_tests/ompi/a.out)

Attached is a reduced code example. I run it like:

mpirun -np 3 valgrind ./a.out

I only see this error if there are an odd number of processes! I don't
know if this is really a problem or not, though. My OMPI application
seems to work OK. However, the linux man page for memcpy says
overlapping range copying is undefined.

Other details: x86_64 (one box, two dual-core opterons), RHEL 4.5,
OpenMPI-1.2.3 compiled with the RHEL-supplied GCC 4 (gcc4 (GCC) 4.1.1
20070105 (Red Hat 4.1.1-53)), valgrind 3.2.3. 

Thanks,
Allen


-- 
Allen Barnett
Transpire, Inc.
e-mail: al...@transpireinc.com
Ph: 518-887-2930

#include 
#include 
#include "mpi.h"

int main ( int argc, char* argv[] )
{
  int rank, size, c;
  MPI_Comm* comms;

  MPI_Init( &argc, &argv );
  MPI_Comm_rank( MPI_COMM_WORLD, &rank );
  MPI_Comm_size( MPI_COMM_WORLD, &size );

  comms = malloc( size * sizeof(MPI_Comm) );

  for ( c = 0; c < size; c++ ) {
int color = MPI_UNDEFINED;
if ( c == rank )
  color = 0;
MPI_Comm_split( MPI_COMM_WORLD, color, 0, &comms[c] );
  }

  MPI_Finalize();

  free( comms );

  return 0;
}


info.bz2
Description: application/bzip


Re: [OMPI users] Problem with X forwarding

2008-06-04 Thread Allen Barnett
[14] ./DistributedData [0x819dc21]
> > [vrc1:27394] *** End of error message ***
> > mpirun noticed that job rank 0 with PID 27394 on node  exited on  
> > signal 11 (Segmentation fault).
> >
> >
> > Maybe I am not doing the xforwading properly, but has anyone ever  
> > encountered the same problem, it works fine on one pc, and I read  
> > the mailing list but I just don't know if my prob is similiar to  
> > their, I even tried changing the DISPLAY env
> >
> >
> > This is what I want to do
> >
> > my mpirun should run on 2 machines ( A and B ) and I should be able  
> > to view the output ( on my PC ),
> > are there any specfic commands to use.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
-- 
Allen Barnett
Transpire, Inc.
e-mail: al...@transpireinc.com
Ph: 518-887-2930




[OMPI users] Memchecker and Wait

2009-08-11 Thread Allen Barnett
Hi:
I'm trying to use the memchecker/valgrind capability of OpenMPI 1.3.3 to
help debug my MPI application. I noticed a rather odd thing: After
Waiting on a Recv Request, valgrind declares my receive buffer as
invalid memory. Is this just a fluke of valgrind, or is OMPI doing
something internally?

This is on a 64-bit RHEL 5 system using GCC 4.3.2 and Valgrind 3.4.1.

Here is an example:
--
#include 
#include 
#include "mpi.h"

int main(int argc, char *argv[])
{
  int rank, size;

  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

  if ( size !=  2 ) {
if ( rank == 0 )
  printf("Please run with 2 processes.\n");
MPI_Finalize();
return 1;
  }

  if (rank == 0) {
char buffer_in[100];
MPI_Request req_in;
MPI_Status status;
memset( buffer_in, 1, sizeof(buffer_in) );
MPI_Recv_init( buffer_in, 100, MPI_CHAR, 1, 123, MPI_COMM_WORLD,
&req_in );
MPI_Start( &req_in );
printf( "Before wait: %p: %d\n", buffer_in, buffer_in[3] );
printf( "Before wait: %p: %d\n", buffer_in, buffer_in[4] );
MPI_Wait( &req_in, &status );
printf( "After wait: %p: %d\n", buffer_in, buffer_in[3] );
printf( "After wait: %p: %d\n", buffer_in, buffer_in[4] );
MPI_Request_free( &req_in );
  }
  else {
char buffer_out[100];
memset( buffer_out, 2, sizeof(buffer_out) );
MPI_Send( buffer_out, 100, MPI_CHAR, 0, 123, MPI_COMM_WORLD );
  }

  MPI_Finalize();
  return 0;
} 
--

Doing "mpirun -np 2 -mca btl ^sm valgrind ./a.out" yields:

Before wait: 0x7ff0003b0: 1
Before wait: 0x7ff0003b0: 1
==15487== 
==15487== Invalid read of size 1
==15487==at 0x400C6B: main (waittest.c:30)
==15487==  Address 0x7ff0003b3 is on thread 1's stack
After wait: 0x7ff0003b0: 2
==15487== 
==15487== Invalid read of size 1
==15487==at 0x400C8B: main (waittest.c:31)
==15487==  Address 0x7ff0003b4 is on thread 1's stack
After wait: 0x7ff0003b0: 2

Also, if I run this program with the shared memory BTL active, valgrind
reports several "conditional jump or move depends on uninitialized
value"s in the SM BTL and about 24k lost bytes at the end (mostly from
allocations in MPI_Init).

Thanks,
Allen

-- 
Allen Barnett
Transpire, Inc
E-Mail: al...@transpireinc.com
Skype:  allenbarnett



Re: [OMPI users] Memchecker and Wait

2009-08-12 Thread Allen Barnett
Hi Shiqing:
That is very clever to invalidate the buffer memory until the comm
completes! However, I guess I'm still confused by my results. Lines 30
and 31 identified by valgrind are the lines after the Wait, and, if I
comment out the prints before the Wait, I still get the valgrind errors
on the "After wait" prints.

If I add prints after the Request_free calls, then I no longer receive
the valgrind errors when accessing "buffer_in" from that point on. So,
it appears that the buffer is marked invalid until the request is freed.

Perhaps I don't understand the sequence of events in MPI. I thought the
buffer was ok to use after the Wait, and requests could be safely
recycled.

Or maybe valgrind is pointing to the wrong lines, however the addresses
which it reports as invalid are exactly those in the buffer which are
being accessed in the post-Wait prints. Here is snippet of a more
instrumented example program with line numbers.
--
25 MPI_Recv_init( buffer_in, 100, MPI_CHAR, 1, 123, MPI_COMM_WORLD,
&req_in );
26 printf( "Before start: %p: %d\n", &buffer_in[0], buffer_in[0] );
27 printf( "Before start: %p: %d\n", &buffer_in[1], buffer_in[1] );
28 MPI_Start( &req_in );
29 printf( "Before wait: %p: %d\n", &buffer_in[2], buffer_in[2] );
30 printf( "Before wait: %p: %d\n", &buffer_in[3], buffer_in[3] );
31 MPI_Wait( &req_in, &status );
32 printf( "After wait: %p: %d\n", &buffer_in[4], buffer_in[4] );
33 printf( "After wait: %p: %d\n", &buffer_in[5], buffer_in[5] );
34 MPI_Request_free( &req_in );
35 printf( "After free: %p: %d\n", &buffer_in[6], buffer_in[6] );
36 printf( "After free: %p: %d\n", &buffer_in[7], buffer_in[7] );
--
And the valgrind output

Before start: 0x7ff0003c0: 1
Before start: 0x7ff0003c1: 1
Before wait: 0x7ff0003c2: 1
Before wait: 0x7ff0003c3: 1
==17395== 
==17395== Invalid read of size 1
==17395==at 0x400CB7: main (waittest.c:32)
==17395==  Address 0x7ff0003c4 is on thread 1's stack
After wait: 0x7ff0003c4: 2
==17395== 
==17395== Invalid read of size 1
==17395==at 0x400CDB: main (waittest.c:33)
==17395==  Address 0x7ff0003c5 is on thread 1's stack
After wait: 0x7ff0003c5: 2
After free: 0x7ff0003c6: 2
After free: 0x7ff0003c7: 2

Here valgrind is complaining about the prints on line 32 and 33 and the
memory addresses are consistent with buffer_in[4] and buffer_in[5]. So,
I'm still puzzled.

Thanks,
Allen

On Wed, 2009-08-12 at 10:31 +0200, Shiqing Fan wrote:
> Hi Allen,
> 
> The invalid reads come from line 30 and 31 of your code, and I guess 
> they are the two 'printf's before MPI_Wait.
> 
> In Open MPI, when memchecker is enabled, OMPI marks the receive buffer 
> as invalid internally, immediately after receive starts for MPI semantic 
> checks, in this case, it just warns the users that they are accessing 
> the receive buffer before the receive has finished, which is not allowed 
> according to the MPI standard.
> 
> For a non-blocking receive, the communication only completes after 
> MPI_Wait is called. After that point, the user buffers are declared 
> valid again, that's why the 'printf's after MPI_Wait don't cause any 
> warnings from Valgrind. Hope this helps. :-)
> 
> 
> Regards,
> Shiqing
> 
> 
> Allen Barnett wrote:
> > Hi:
> > I'm trying to use the memchecker/valgrind capability of OpenMPI 1.3.3 to
> > help debug my MPI application. I noticed a rather odd thing: After
> > Waiting on a Recv Request, valgrind declares my receive buffer as
> > invalid memory. Is this just a fluke of valgrind, or is OMPI doing
> > something internally?
> >
> > This is on a 64-bit RHEL 5 system using GCC 4.3.2 and Valgrind 3.4.1.
> >
> > Here is an example:
> > --
> > #include 
> > #include 
> > #include "mpi.h"
> >
> > int main(int argc, char *argv[])
> > {
> >   int rank, size;
> >
> >   MPI_Init(&argc, &argv);
> >   MPI_Comm_size(MPI_COMM_WORLD, &size);
> >   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> >
> >   if ( size !=  2 ) {
> > if ( rank == 0 )
> >   printf("Please run with 2 processes.\n");
> > MPI_Finalize();
> > return 1;
> >   }
> >
> >   if (rank == 0) {
> > char buffer_in[100];
> > MPI_Request req_in;
> > MPI_Status status;
> > memset( buffer_in, 1, sizeof(buffer_in) );
> > MPI_Recv_init( buffer_in, 100, MPI_CHAR, 1, 123, MPI_COMM_WORLD,
> > &req_in );
> &

[OMPI users] OpenMPI 1.3 Infiniband Hang

2009-08-12 Thread Allen Barnett
Hi:
I recently tried to build my MPI application against OpenMPI 1.3.3. It
worked fine with OMPI 1.2.9, but with OMPI 1.3.3, it hangs part way
through. It does a fair amount of comm, but eventually it stops in a
Send/Recv point-to-point exchange. If I turn off the openib btl, it runs
to completion. Also, I built 1.3.3 with memchecker (which is very nice;
thanks to everyone who worked on that!) and it runs to completion, even
with openib active.

Our cluster consists of dual dual-core opteron boxes with Mellanox
MT25204 (InfiniHost III Lx) HCAs and a Mellanox MT47396 Infiniscale-III
switch. We're running RHEL 4.8 which appears to include OFED 1.4. I've
built everything using GCC 4.3.2. Here is the output from ibv_devinfo.
"ompi_info --all" is attached.
$ ibv_devinfo
hca_id: mthca0
fw_ver: 1.1.0
node_guid:  0002:c902:0024:3284
sys_image_guid: 0002:c902:0024:3287
vendor_id:  0x02c9
vendor_part_id: 25204
hw_ver: 0xA0
board_id:   MT_03B0140002
phys_port_cnt:  1
port:   1
state:  active (4)
max_mtu:2048 (4)
active_mtu: 2048 (4)
sm_lid: 1
port_lid:   1
port_lmc:   0x00

I'd appreciate any tips for debugging this.
Thanks,
Allen

-- 
Allen Barnett
Transpire, Inc
E-Mail: al...@transpireinc.com
Skype:  allenbarnett
Ph: 518-887-2930


ompinfo.gz
Description: GNU Zip compressed data


Re: [OMPI users] OpenMPI 1.3 Infiniband Hang

2009-08-19 Thread Allen Barnett
Hi: Setting mpi_leave_pinned to 0 allows my application to run to
completion when running with openib active. I realize that it's probably
not going to help my application's performance, but since "ON" is the
default, I'd like to understand what's happening. There's definitely a
dependence on problem size: smaller problems run to completion while
larger problems hang at different points in the code. Are there buffer
sizes (or other BTL settings) I can adjust to understand my problem
better?

Thanks,
Allen

On Thu, 2009-08-13 at 10:11 +0300, Lenny Verkhovsky wrote:
> Hi, 
> 1.
> The Mellanox has a newer fw for those
> HCAshttp://www.mellanox.com/content/pages.php?pg=firmware_table_IH3Lx
> 
> I am not sure if it will help, but newer fw usually have some bug
> fixes.
> 
> 2.
> try to disable leave_pinned during the run. It's on by default in
> 1.3.3
> 
> Lenny.
> 
> On Thu, Aug 13, 2009 at 5:12 AM, Allen Barnett
>  wrote:
> Hi:
> I recently tried to build my MPI application against OpenMPI
> 1.3.3. It
> worked fine with OMPI 1.2.9, but with OMPI 1.3.3, it hangs
> part way
> through. It does a fair amount of comm, but eventually it
> stops in a
> Send/Recv point-to-point exchange. If I turn off the openib
> btl, it runs
> to completion. Also, I built 1.3.3 with memchecker (which is
> very nice;
> thanks to everyone who worked on that!) and it runs to
> completion, even
> with openib active.
> 
> Our cluster consists of dual dual-core opteron boxes with
> Mellanox
> MT25204 (InfiniHost III Lx) HCAs and a Mellanox MT47396
> Infiniscale-III
> switch. We're running RHEL 4.8 which appears to include OFED
> 1.4. I've
> built everything using GCC 4.3.2. Here is the output from
> ibv_devinfo.
> "ompi_info --all" is attached.
> $ ibv_devinfo
> hca_id: mthca0
>fw_ver: 1.1.0
>node_guid:  0002:c902:0024:3284
>sys_image_guid: 0002:c902:0024:3287
>vendor_id:  0x02c9
>vendor_part_id: 25204
>hw_ver: 0xA0
>board_id:   MT_03B0140002
>phys_port_cnt:  1
>port:   1
>state:  active (4)
>max_mtu:2048 (4)
>active_mtu: 2048 (4)
>sm_lid: 1
>port_lid:   1
>port_lmc:   0x00
> 
> I'd appreciate any tips for debugging this.
> Thanks,
> Allen




[OMPI users] Bus Error in ompi_free_list_grow

2008-10-17 Thread Allen Barnett
Hi: A customer is running our parallel application on an SGI Altix
machine. They compiled OMPI 1.2.8 themselves. The Altix uses IB
interfaces and they recently upgraded to OFED 1.3 (in SGI Propack 6).
They are receiving a bus error in ompi_free_list_grow:

[r1i0n0:01321] *** Process received signal ***
[r1i0n0:01321] Signal: Bus error (7)
[r1i0n0:01321] Signal code:  (2)
[r1i0n0:01321] Failing at address: 0x2b04ba07c4a0
[r1i0n0:01321] [ 0] /lib64/libpthread.so.0 [0x2b04b00cfc00]
[r1i0n0:01321] [ 1] 
/usr/local/attila/severian-0.3.2-beta/lib/x86_64-Linux/libmpi.so.0(ompi_free_list_grow+0x14a)
 
[0x2b04af7dc058]
[r1i0n0:01321] [ 2] 
/usr/local/attila/severian-0.3.2-beta/lib/x86_64-Linux/openmpi/mca_btl_sm.so(mca_btl_sm_alloc+0x321)
 
[0x2b04b38c8e35]
[r1i0n0:01321] [ 3] 
/usr/local/attila/severian-0.3.2-beta/lib/x86_64-Linux/openmpi/mca_pml_ob1.so(mca_pml_ob1_send_request_start_copy+0x26d)
 
[0x2b04b3378f91]
[r1i0n0:01321] [ 4] 
/usr/local/attila/severian-0.3.2-beta/lib/x86_64-Linux/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x546)
 
[0x2b04b3370c7e]
[r1i0n0:01321] [ 5] 
/usr/local/attila/severian-0.3.2-beta/lib/x86_64-Linux/libmpi.so.0(MPI_Send+0x28)
 
[0x2b04af814098]

Here is some more information about the machine:

SGI Altix ICE 8200 cluster; each node has two quad core Xeons with 16GB
SUSE Linux Enterprise Server 10 Service Pack 2
GNU C Library stable release version 2.4 (20080421)
gcc (GCC) 4.1.2 20070115 (SUSE Linux)
SGI Propack 6 (just upgraded from Propack 5 SP3: changed from 
OFED 1.2 to 1.3)

The output from ompi_info is attached.

I would appreciate any help debugging this.

Thanks,
Allen

-- 
Allen Barnett
E-Mail: al...@transpireinc.com
Skype:  allenbarnett
Ph: 518-887-2930



ompi_info.txt.bz2
Description: application/bzip


Re: [OMPI users] Handling output of processes

2009-01-23 Thread Allen Barnett
On Thu, 2009-01-22 at 06:33 -0700, Ralph Castain wrote:
> If you need to do this with a prior releasewell, I'm afraid it  
> won't work. :-)

As a quick hack for 1.2.x, I sometimes use this script to wrap my
executable:
---
#!/bin/sh
# sompi.sh: Send each rank's output to a separate file.
# Note use of undocumented OMPI 1.2.x environment variables!
exec "$*" > "listing.$OMPI_MCA_ns_nds_num_procs.$OMP_MCA_ns_nds_vpid"
---

Then do:

$ mpirun -np 3 ~allen/bin/sompi.sh parallel_program

As the processes run, you can "tail" the individual listing files to see
what's happening. Of course, the working directory has to be writable
and you have to find the machine and directory where the output is being
redirected to, and so on...

Allen

> On Jan 22, 2009, at 1:58 AM, jody wrote:
> 
> > Hi
> > I have a small cluster consisting of 9 computers (8x2 CPUs, 1x4 CPUs).
> > I would like to be able to observe the output of the processes
> > separately during an mpirun.
> >
> > What i currently do is to apply the mpirun to a shell script which
> > opens a xterm for each process,
> > which then starts the actual application.
> >
> > This works, but is a bit complicated, e.g. finding the window you're
> > interested in among 19 others.
> >
> > So i was wondering is there a possibility to capture the processes'
> > outputs separately, so
> > i can make an application in which i can switch between the different
> > processor outputs?
> > I could imagine that could be done by wrapper applications which
> > redirect the output over a TCP
> > socket to a server application.
> >
> > But perhaps there is an easier way, or something like this alread  
> > does exist?
> >
> > Thank You
> >  Jody
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
-- 
Allen Barnett
E-Mail: al...@transpireinc.com
Skype:  allenbarnett
Ph: 518-887-2930




Re: [OMPI users] Handling output of processes

2009-01-26 Thread Allen Barnett
On Sun, 2009-01-25 at 05:20 -0700, Ralph Castain wrote:

> 2. redirect output of specified processes to files using the provided  
> filename appended with ".rank". You can do this for all ranks, or a  
> specified subset of them.

A filename extension including both the comm size and the rank is
helpful if, like me, you run several jobs out of the same directory with
different numbers of processors. Say "listing.32.01" for rank 1, -np 32.
(And, as always, padding numbers with zeros makes "ls" behave more
sanely.)

Thanks,
Allen

-- 
Allen Barnett
E-Mail: al...@transpireinc.com
Skype:  allenbarnett
Ph: 518-887-2930




[OMPI users] Spawn and OpenFabrics

2009-06-02 Thread Allen Barnett
On Tue, 2009-05-19 at 08:29 -0400, Jeff Squyres wrote:
> fork() support in OpenFabrics has always been dicey -- it can lead to  
> random behavior like this.  Supposedly it works in a specific set of  
> circumstances, but I don't have a recent enough kernel on my machines  
> to test.
> 
> It's best not to use calls to system() if they can be avoided.   
> Indeed, Open MPI v1.3.x will warn you if you create a child process  
> after MPI_INIT when using OpenFabrics networks.

My C++ OMPI program uses system() to invoke an external mesh partitioner
program after MPI_INIT is called. Sometimes (with frustrating
randomness), on systems using OFED the system() call fails with EFAULT
(Bad address). The linux kernel appears to feel that the execve()
function is being passed a string which isn't in the process' address
space. The exec string is constructed immediately before calling
system() like this:

std::stringstream ss;
ss << "partitioner_program " << COMM_WORLD_SIZE;
system( ss.str().c_str() );

Could this behavior related to this admonition?

Also, would MPI_COMM_SPAWN suffer from the same difficulties?

Thanks,
Allen

-- 
Allen Barnett
E-Mail: al...@transpireinc.com
Skype:  allenbarnett
Ph: 518-887-2930




Re: [OMPI users] Spawn and OpenFabrics

2009-06-02 Thread Allen Barnett
On Tue, 2009-06-02 at 12:27 -0400, Jeff Squyres wrote:
> On Jun 2, 2009, at 11:37 AM, Allen Barnett wrote:
> 
> > std::stringstream ss;
> > ss << "partitioner_program " << COMM_WORLD_SIZE;
> > system( ss.str().c_str() );
> >
> 
> You'd probably see the same problem even if you strdup'ed the c_str()  
> and system()'ed that.
> 
> What kernel are you using? 

I've seen it myself on my generic opteron RHEL 4 cluster with kernel
2.6.9-78.0.22; I can't really figure out which version of OFED it uses
(maybe 1.2?). A customer has reported it on an Altix system with SLES
10.2 and kernel 2.6.16.60 with a version of OFED 1.3.

>  Does OMPI say that it has IBV fork support?
>  ompi_info --param btl openib --parsable | grep have_fork_support

My RHEL4 system reports:

MCA btl: parameter "btl_openib_want_fork_support" (current value: "-1")
MCA btl: information "btl_openib_have_fork_support" (value: "1")

as does the build installed on the Altix system.

> Be sure to also see 
> http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork 

We're using OMPI 1.2.8.

> > Also, would MPI_COMM_SPAWN suffer from the same difficulties?
> >
> 
> 
> It shouldn't; we proxy the launch of new commands off to mpirun /  
> OMPI's run-time system.  Specifically: the new process(es) are not  
> POSIX children of the process(es) that called MPI_COMM_SPAWN.

Is a program started with MPI_COMM_SPAWN required to call MPI_INIT? I
guess what I'm asking is if I will have to make my partitioner an
OpenMPI program as well?

Thanks,
Allen

-- 
Allen Barnett
E-Mail: al...@transpireinc.com
Skype:  allenbarnett




Re: [OMPI users] Spawn and OpenFabrics

2009-06-07 Thread Allen Barnett
OK. I appreciate the suggestion and will definitely try it out.

Thanks,
Allen


On Fri, 2009-06-05 at 10:14 -0400, Jeff Squyres wrote:
> On Jun 2, 2009, at 3:26 PM, Allen Barnett wrote:
> > I
> > guess what I'm asking is if I will have to make my partitioner an
> > OpenMPI program as well?
> >
> 
> 
> If you use MPI_COMM_SPAWN with the 1.2 series, yes.
> 
> Another less attractive but functional solution would be to do what I  
> did for the new command notifier due in the OMPI v1.5 series  
> ("notifier" = subsystem to notify external agents when OMPI detects  
> something wrong, like write to the syslog, send an email, write to a  
> sysadmin mysql db, etc., "command" = plugin that simply forks and runs  
> whatever command you want).  During MPI_INIT, the fork notifier pre- 
> forks a dummy process.  This dummy process then waits for commands via  
> a pipe.  When the parent (MPI process itself) wants to fork a child,  
> it sends the argv to exec down the pipe and has the child process  
> actually do the fork and exec.
> 
> Proxying all the fork requests through a secondary process like this  
> avoids all the problems with registered memory in the child process.   
> This is icky, but it is an unfortunately necessity for OS-bypass/ 
> registration-based networks like OpenFabrics.
> 
> In your case, you'd want to pre-fork before calling MPI_INIT.  But the  
> rest of the technique is pretty much the same.
> 
> Have a look at the code in this tree if it helps:
> 
>  
> https://svn.open-mpi.org/trac/ompi/browser/trunk/orte/mca/notifier/command





[OMPI users] Hang with Mixed Machines

2006-12-08 Thread Allen Barnett
Hi:
I have a "cluster" consisting of a dual Opteron system (called a.lan)
and a dual AthlonMP system (b.lan). Both systems are running Red Hat
Enterprise Linux 4. The opteron system runs in 64-bit mode; the AthlonMP
in 32-bit. I can't seem to make OpenMPI work between these two machines.
I've tried 1.1.2, 1.1.3b1, and 1.2b1 and they all exhibit the same
behavior, namely that Bcasts won't complete. Here's my simple.cpp test
program:

#include 
#include "mpi.h"

int main ( int argc, char* argv[] )
{
  MPI_Init( &argc, &argv );
  char hostname[256];
  int hostname_size = sizeof(hostname);
  MPI_Get_processor_name( hostname, &hostname_size );
  std::cout << "Running on " << hostname << std::endl;

  std::cout << hostname <<  " in to Bcast" << std::endl;
  double a = 3.14159;
  MPI_Bcast( &a, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD );
  std::cout << hostname << " out of Bcast" << std::endl;

  MPI_Finalize();
  return 0;
}

I compile this and run it with "mpirun --host a.lan --host b.lan
simple". Generally, if I'm on a.lan, I see:

Running on a.lan
a.lan in to Bcast
Running on b.lan
a.lan out of Bcast
b.lan in to Bcast


If I launch from b.lan, then the reverse happens (i.e., it exits the
Bcast on b.lan, but never exits Bcast on a.lan and a.lan uses 100% cpu).

On the other hand, I have another 32-bit system (just a plain Athlon
running RHEL 4, called c.lan). My test program runs fine between b.lan
and c.lan.

I feel like I must be making an incredibly obvious mistake.

Thanks,
Allen

-- 
Allen Barnett
Transpire, Inc.
E-Mail: al...@transpireinc.com
Ph: 518-887-2930



[OMPI users] Relocating an Installation

2006-12-13 Thread Allen Barnett
Hi: 
There was a thread back in November started by Patrick Jessee about
relocating an installation after it was built (the subject was: removing
hard-coded paths from OpenMPI shared libraries). I guess I'm in the same
boat now. I would like to distribute our OpenMPI-based parallel solver;
but I can't really dictate where a user will install our software. Has
any one succeeded in building a version of OpenMPI which can be
relocated?

Thanks,
Allen
-- 
Allen Barnett
Transpire, Inc.
E-Mail: al...@transpireinc.com
Ph: 518-887-2930



Re: [OMPI users] Relocating an Installation

2006-12-27 Thread Allen Barnett
Upon reflecting on this more, I guess I see two issues. First, there's
the issue of allowing the user to install our software on the system
where ever they like. Some users may want to install it in their home
directories, others may have a sys admin install it in a common
location. This seems like a substantial reason to allow an OMPI
installation to be relocated. So, I would say this was a very important
capability.

On the other hand, I don't have access to all the third party headers
and libraries which are necessary to build some of the more interesting
OMPI modules, such as the Infiniband and Myrinet drivers and many of the
batch scheduling drivers (tm? LoadLeveler? PORTALS? Xgrid? [I'm not sure
what these are]. And maybe a related question: One customer uses NQS
(NQE?); can this be supported?) So, I would expect the user may want to
compile at least some of OMPI himself (or herself) in order to activate
these modules. Thus, perhaps I should supply a partially built
installation which completes compilation as part of the installation
process? This seems somewhat impractical since it would require
compilers, headers and libraries, etc, on every machine on which our
software is installed.

I also don't know what distribution restrictions are placed on all the
3rd party software OMPI can link against. This may limit what can be
redistributed with our product.

So, I guess I'm open to suggestions on how best to distribute our
software. Being able to relocate an installation and being able to build
specific modules at installation time would appear to be very helpful
capabilities.

Many thanks,
Allen

On Fri, 2006-12-15 at 19:45 -0500, Jeff Squyres wrote:
> Greetings Allen.
> 
> This problem has not yet been resolved, but I'm quite sure we have an  
> open ticket about this on our bug tracker.  I just replied to Patrick  
> on-list about a related issue (his use of --prefix); I'd have to  
> think about this a bit more, but a solution proposed by one of the  
> other OMPI developers in internal conversations may fix both issues.   
> It only hasn't been coded up because we didn't prioritize it high.
> 
> So my question to you is -- how high of a priority is this for you?   
> Part of what makes it into each OMPI release is driven by what users  
> want/need, so input like this helps us prioritize the work.
> 
> Thanks!
> 
> 
> On Dec 13, 2006, at 10:37 AM, Allen Barnett wrote:
> 
> > There was a thread back in November started by Patrick Jessee about
> > relocating an installation after it was built (the subject was:  
> > removing
> > hard-coded paths from OpenMPI shared libraries). I guess I'm in the  
> > same
> > boat now. I would like to distribute our OpenMPI-based parallel  
> > solver;
> > but I can't really dictate where a user will install our software. Has
> > any one succeeded in building a version of OpenMPI which can be
> > relocated?
> 
-- 
Allen Barnett
Transpire, Inc.
E-Mail: al...@transpireinc.com
Ph: 518-887-2930