Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

2010-07-27 Thread Terry Dontje
A clarification from your previous email, you had your code working with 
OMPI 1.4.1 but an older version of OFED?  Then you upgraded to OFED 1.4 
and things stopped working?  Sounds like your current system is set up 
with OMPI 1.4.2 and OFED 1.5.  Anyways, I am a little confused as to 
when things might have actually broke.


My first guess would be something may be wrong with the OFED setup.  
Have checked the status of your ib devices with ibv_devinfo?  Have you 
ran any of the OFED rc tests like ibv_rc_pingpong? 

If the above seems ok have you tried to run a simpler OMPI test like 
connectivity?  I would see if a simple np=2 run spanning across two 
nodes fails?


What OS distribution and version you are running on?

--td
Brian Smith wrote:

In case my previous e-mail is too vague for anyone to address, here's a
backtrace from my application.  This version, compiled with Intel
11.1.064 (OpenMPI 1.4.2 w/ gcc 4.4.2), hangs during MPI_Alltoall
instead.  Running on 16 CPUs, Opteron 2427, Mellanox Technologies
MT25418 w/ OFED 1.5

strace on all ranks repeatedly shows:
poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6,
events=POLLIN}, {fd=7, events=POLLIN}, {fd=10, events=POLLIN}, {fd=22,
events=POLLIN}, {fd=23, events=POLLIN}], 7, 0) = 0 (Timeout)
...

gdb --pid=
(gdb) bt
#0  sm_fifo_read () at btl_sm.h:267
#1  mca_btl_sm_component_progress () at btl_sm_component.c:391
#2  0x2b00085116ea in opal_progress () at
runtime/opal_progress.c:207
#3  0x2b0007def215 in opal_condition_wait (count=2,
requests=0x7fffd27802a0, statuses=0x7fffd2780270)
at ../opal/threads/condition.h:99
#4  ompi_request_default_wait_all (count=2, requests=0x7fffd27802a0,
statuses=0x7fffd2780270) at request/req_wait.c:262
#5  0x2b0007e805b7 in ompi_coll_tuned_sendrecv_actual
(sendbuf=0x2aaac2c4c210, scount=28000, 
sdatatype=0x2b0008198ea0, dest=6, stag=-13, recvbuf=out>, rcount=28000, rdatatype=0x2b0008198ea0, 
source=10, rtag=-13, comm=0x16ad7420, status=0x0) at

coll_tuned_util.c:55
#6  0x2b0007e8705f in ompi_coll_tuned_sendrecv (sbuf=0x2aaac2b04010,
scount=28000, sdtype=0x2b0008198ea0, 
rbuf=0x2aaac99a2010, rcount=28000, rdtype=0x2b0008198ea0,

comm=0x16ad7420, module=0x16ad8450)
at coll_tuned_util.h:60
#7  ompi_coll_tuned_alltoall_intra_pairwise (sbuf=0x2aaac2b04010,
scount=28000, sdtype=0x2b0008198ea0, 
rbuf=0x2aaac99a2010, rcount=28000, rdtype=0x2b0008198ea0,

comm=0x16ad7420, module=0x16ad8450)
at coll_tuned_alltoall.c:70
#8  0x2b0007e0a71f in PMPI_Alltoall (sendbuf=0x2aaac2b04010,
sendcount=28000, sendtype=0x2b0008198ea0, 
recvbuf=0x2aaac99a2010, recvcount=28000, recvtype=0x2b0008198ea0,

comm=0x16ad7420) at palltoall.c:84
#9  0x2b0007b8bc86 in mpi_alltoall_f (sendbuf=0x2aaac2b04010 "",
sendcount=0x7fffd27806a0, 
sendtype=, 
recvbuf=0x2aaac99a2010 "6%\177e\373\354\306>\346\226z\262\347\350

\260>\032ya(\303\003\272\276\231\343\322\363zjþ\230\247i\232\307PԾ(\304
\373\321D\261ľ\204֜Εh־H\266H\342l2\245\276\231C7]\003\250Ǿ`\277\231\272
\265E\261>j\213ѓ\370\002\263>НØx.\254>}\332-\313\371\326\320>\346\245f
\304\f\214\262\276\070\222zf#'\321>\024\066̆\026\227ɾ.T\277\266}\366
\270>h|\323L\330\fƾ^z\214!q*\277\276pQ?O\346\067\270>~\006\300",
recvcount=0x7fffd27806a4, recvtype=0xb67490, 
comm=0x12d9ba0, ierr=0x7fffd27806a8) at palltoall_f.c:76

#10 0x004634cc in m_sumf_d_ ()
#11 0x00463072 in m_sum_z_ ()
#12 0x004c8a8b in mrg_grid_rc_ ()
#13 0x004ffc5e in rhosym_ ()
#14 0x00610dc6 in us_mp_set_charge_ ()
#15 0x00771c43 in elmin_ ()
#16 0x00453853 in MAIN__ ()
#17 0x0042f15c in main ()

On other processes:

(gdb) bt
#0  0x003692a0b725 in pthread_spin_lock ()
from /lib64/libpthread.so.0
#1  0x2acdfa7b in ibv_cmd_create_qp ()
from /usr/lib64/libmlx4-rdmav2.so
#2  0x2b9dc1db3ff8 in progress_one_device ()
at /usr/include/infiniband/verbs.h:884
#3  btl_openib_component_progress () at btl_openib_component.c:3451
#4  0x2b9dc24736ea in opal_progress () at
runtime/opal_progress.c:207
#5  0x2b9dc1d51215 in opal_condition_wait (count=2,
requests=0x7fffece3cc20, statuses=0x7fffece3cbf0)
at ../opal/threads/condition.h:99
#6  ompi_request_default_wait_all (count=2, requests=0x7fffece3cc20,
statuses=0x7fffece3cbf0) at request/req_wait.c:262
#7  0x2b9dc1de25b7 in ompi_coll_tuned_sendrecv_actual
(sendbuf=0x2aaac2c4c210, scount=28000, 
sdatatype=0x2b9dc20faea0, dest=6, stag=-13, recvbuf=out>, rcount=28000, rdatatype=0x2b9dc20faea0, 
source=10, rtag=-13, comm=0x1745b420, status=0x0) at

coll_tuned_util.c:55
#8  0x2b9dc1de905f in ompi_coll_tuned_sendrecv (sbuf=0x2aaac2b04010,
scount=28000, sdtype=0x2b9dc20faea0, 
rbuf=0x2aaac99a2010, rcount=28000, rdtype=0x2b9dc20faea0,

comm=0x1745b420, module=0x1745c450)
at coll_tuned_util.h:60
#9  ompi_coll_tuned_alltoall_intra_pairwise (sbuf=0x2aaac2b04010,
scount=28000, sdtype=0x2b9dc20faea0, 
rbuf=0x2aaac99a2010, rcount=28000, rd

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

2010-07-27 Thread Terry Dontje

Can you try a simple point-to-point program?

--td

Brian Smith wrote:

After running on two processors across two nodes, the problem occurs
much earlier during execution:

(gdb) bt
#0  opal_sys_timer_get_cycles ()
at ../opal/include/opal/sys/amd64/timer.h:46
#1  opal_timer_base_get_cycles ()
at ../opal/mca/timer/linux/timer_linux.h:31
#2  opal_progress () at runtime/opal_progress.c:181
#3  0x2b4bc3c00215 in opal_condition_wait (count=2,
requests=0x7fff33372480, statuses=0x7fff33372450)
at ../opal/threads/condition.h:99
#4  ompi_request_default_wait_all (count=2, requests=0x7fff33372480,
statuses=0x7fff33372450) at request/req_wait.c:262
#5  0x2b4bc3c915b7 in ompi_coll_tuned_sendrecv_actual
(sendbuf=0x2aaad11dfaf0, scount=117692, 
sdatatype=0x2b4bc3fa9ea0, dest=1, stag=-13, recvbuf=out>, rcount=117692, 
rdatatype=0x2b4bc3fa9ea0, source=1, rtag=-13, comm=0x12cd98c0,

status=0x0) at coll_tuned_util.c:55
#6  0x2b4bc3c982db in ompi_coll_tuned_sendrecv (sbuf=0x2aaad10f9d10,
scount=117692, sdtype=0x2b4bc3fa9ea0, 
rbuf=0x2aaae104d010, rcount=117692, rdtype=0x2b4bc3fa9ea0,

comm=0x12cd98c0, module=0x12cda340)
at coll_tuned_util.h:60
#7  ompi_coll_tuned_alltoall_intra_two_procs (sbuf=0x2aaad10f9d10,
scount=117692, sdtype=0x2b4bc3fa9ea0, 
rbuf=0x2aaae104d010, rcount=117692, rdtype=0x2b4bc3fa9ea0,

comm=0x12cd98c0, module=0x12cda340)
at coll_tuned_alltoall.c:432
#8  0x2b4bc3c1b71f in PMPI_Alltoall (sendbuf=0x2aaad10f9d10,
sendcount=117692, sendtype=0x2b4bc3fa9ea0, 
recvbuf=0x2aaae104d010, recvcount=117692, recvtype=0x2b4bc3fa9ea0,

comm=0x12cd98c0) at palltoall.c:84
#9  0x2b4bc399cc86 in mpi_alltoall_f (sendbuf=0x2aaad10f9d10 "Z\n
\271\356\023\254\271?", sendcount=0x7fff33372688, 
sendtype=, recvbuf=0x2aaae104d010 "",
recvcount=0x7fff3337268c, recvtype=0xb67490, 
comm=0x12d9d20, ierr=0x7fff33372690) at palltoall_f.c:76

#10 0x004613b8 in m_alltoall_z_ ()
#11 0x004ec55f in redis_pw_ ()
#12 0x005643d0 in choleski_mp_orthch_ ()
#13 0x0043fbba in MAIN__ ()
#14 0x0042f15c in main ()

On Tue, 2010-07-27 at 06:14 -0400, Terry Dontje wrote:
  

A clarification from your previous email, you had your code working
with OMPI 1.4.1 but an older version of OFED?  Then you upgraded to
OFED 1.4 and things stopped working?  Sounds like your current system
is set up with OMPI 1.4.2 and OFED 1.5.  Anyways, I am a little
confused as to when things might have actually broke.

My first guess would be something may be wrong with the OFED setup.
Have checked the status of your ib devices with ibv_devinfo?  Have you
ran any of the OFED rc tests like ibv_rc_pingpong?  


If the above seems ok have you tried to run a simpler OMPI test like
connectivity?  I would see if a simple np=2 run spanning across two
nodes fails?

What OS distribution and version you are running on?

--td
Brian Smith wrote: 


In case my previous e-mail is too vague for anyone to address, here's a
backtrace from my application.  This version, compiled with Intel
11.1.064 (OpenMPI 1.4.2 w/ gcc 4.4.2), hangs during MPI_Alltoall
instead.  Running on 16 CPUs, Opteron 2427, Mellanox Technologies
MT25418 w/ OFED 1.5

strace on all ranks repeatedly shows:
poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6,
events=POLLIN}, {fd=7, events=POLLIN}, {fd=10, events=POLLIN}, {fd=22,
events=POLLIN}, {fd=23, events=POLLIN}], 7, 0) = 0 (Timeout)
...

gdb --pid=
(gdb) bt
#0  sm_fifo_read () at btl_sm.h:267
#1  mca_btl_sm_component_progress () at btl_sm_component.c:391
#2  0x2b00085116ea in opal_progress () at
runtime/opal_progress.c:207
#3  0x2b0007def215 in opal_condition_wait (count=2,
requests=0x7fffd27802a0, statuses=0x7fffd2780270)
at ../opal/threads/condition.h:99
#4  ompi_request_default_wait_all (count=2, requests=0x7fffd27802a0,
statuses=0x7fffd2780270) at request/req_wait.c:262
#5  0x2b0007e805b7 in ompi_coll_tuned_sendrecv_actual
(sendbuf=0x2aaac2c4c210, scount=28000, 
sdatatype=0x2b0008198ea0, dest=6, stag=-13, recvbuf=out>, rcount=28000, rdatatype=0x2b0008198ea0, 
source=10, rtag=-13, comm=0x16ad7420, status=0x0) at

coll_tuned_util.c:55
#6  0x2b0007e8705f in ompi_coll_tuned_sendrecv (sbuf=0x2aaac2b04010,
scount=28000, sdtype=0x2b0008198ea0, 
rbuf=0x2aaac99a2010, rcount=28000, rdtype=0x2b0008198ea0,

comm=0x16ad7420, module=0x16ad8450)
at coll_tuned_util.h:60
#7  ompi_coll_tuned_alltoall_intra_pairwise (sbuf=0x2aaac2b04010,
scount=28000, sdtype=0x2b0008198ea0, 
rbuf=0x2aaac99a2010, rcount=28000, rdtype=0x2b0008198ea0,

comm=0x16ad7420, module=0x16ad8450)
at coll_tuned_alltoall.c:70
#8  0x2b0007e0a71f in PMPI_Alltoall (sendbuf=0x2aaac2b04010,
sendcount=28000, sendtype=0x2b0008198ea0, 
recvbuf=0x2aaac99a2010, recvcount=28000, recvtype=0x2b0008198ea0,

comm=0x16ad7420) at palltoall.c:84
#9  0x2b0007b8bc86 in mpi_alltoall_f (sendbuf=0x2aaac2b04010 "",
sendcount=0x7

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

2010-07-27 Thread Terry Dontje
With this earlier failure do you know how many message may have been 
transferred between the two processes?  Is there a way to narrow this 
down to a small piece of code?  Do you have totalview or ddt at your 
disposal?


--td

Brian Smith wrote:

Also, the application I'm having trouble with appears to work fine with
MVAPICH2 1.4.1, if that is any help.

-Brian

On Tue, 2010-07-27 at 10:48 -0400, Terry Dontje wrote:
  

Can you try a simple point-to-point program?

--td

Brian Smith wrote: 


After running on two processors across two nodes, the problem occurs
much earlier during execution:

(gdb) bt
#0  opal_sys_timer_get_cycles ()
at ../opal/include/opal/sys/amd64/timer.h:46
#1  opal_timer_base_get_cycles ()
at ../opal/mca/timer/linux/timer_linux.h:31
#2  opal_progress () at runtime/opal_progress.c:181
#3  0x2b4bc3c00215 in opal_condition_wait (count=2,
requests=0x7fff33372480, statuses=0x7fff33372450)
at ../opal/threads/condition.h:99
#4  ompi_request_default_wait_all (count=2, requests=0x7fff33372480,
statuses=0x7fff33372450) at request/req_wait.c:262
#5  0x2b4bc3c915b7 in ompi_coll_tuned_sendrecv_actual
(sendbuf=0x2aaad11dfaf0, scount=117692, 
sdatatype=0x2b4bc3fa9ea0, dest=1, stag=-13, recvbuf=out>, rcount=117692, 
rdatatype=0x2b4bc3fa9ea0, source=1, rtag=-13, comm=0x12cd98c0,

status=0x0) at coll_tuned_util.c:55
#6  0x2b4bc3c982db in ompi_coll_tuned_sendrecv (sbuf=0x2aaad10f9d10,
scount=117692, sdtype=0x2b4bc3fa9ea0, 
rbuf=0x2aaae104d010, rcount=117692, rdtype=0x2b4bc3fa9ea0,

comm=0x12cd98c0, module=0x12cda340)
at coll_tuned_util.h:60
#7  ompi_coll_tuned_alltoall_intra_two_procs (sbuf=0x2aaad10f9d10,
scount=117692, sdtype=0x2b4bc3fa9ea0, 
rbuf=0x2aaae104d010, rcount=117692, rdtype=0x2b4bc3fa9ea0,

comm=0x12cd98c0, module=0x12cda340)
at coll_tuned_alltoall.c:432
#8  0x2b4bc3c1b71f in PMPI_Alltoall (sendbuf=0x2aaad10f9d10,
sendcount=117692, sendtype=0x2b4bc3fa9ea0, 
recvbuf=0x2aaae104d010, recvcount=117692, recvtype=0x2b4bc3fa9ea0,

comm=0x12cd98c0) at palltoall.c:84
#9  0x2b4bc399cc86 in mpi_alltoall_f (sendbuf=0x2aaad10f9d10 "Z\n
\271\356\023\254\271?", sendcount=0x7fff33372688, 
sendtype=, recvbuf=0x2aaae104d010 "",
recvcount=0x7fff3337268c, recvtype=0xb67490, 
comm=0x12d9d20, ierr=0x7fff33372690) at palltoall_f.c:76

#10 0x004613b8 in m_alltoall_z_ ()
#11 0x004ec55f in redis_pw_ ()
#12 0x005643d0 in choleski_mp_orthch_ ()
#13 0x0043fbba in MAIN__ ()
#14 0x0042f15c in main ()

On Tue, 2010-07-27 at 06:14 -0400, Terry Dontje wrote:
  
  

A clarification from your previous email, you had your code working
with OMPI 1.4.1 but an older version of OFED?  Then you upgraded to
OFED 1.4 and things stopped working?  Sounds like your current system
is set up with OMPI 1.4.2 and OFED 1.5.  Anyways, I am a little
confused as to when things might have actually broke.

My first guess would be something may be wrong with the OFED setup.
Have checked the status of your ib devices with ibv_devinfo?  Have you
ran any of the OFED rc tests like ibv_rc_pingpong?  


If the above seems ok have you tried to run a simpler OMPI test like
connectivity?  I would see if a simple np=2 run spanning across two
nodes fails?

What OS distribution and version you are running on?

--td
Brian Smith wrote: 



In case my previous e-mail is too vague for anyone to address, here's a
backtrace from my application.  This version, compiled with Intel
11.1.064 (OpenMPI 1.4.2 w/ gcc 4.4.2), hangs during MPI_Alltoall
instead.  Running on 16 CPUs, Opteron 2427, Mellanox Technologies
MT25418 w/ OFED 1.5

strace on all ranks repeatedly shows:
poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6,
events=POLLIN}, {fd=7, events=POLLIN}, {fd=10, events=POLLIN}, {fd=22,
events=POLLIN}, {fd=23, events=POLLIN}], 7, 0) = 0 (Timeout)
...

gdb --pid=
(gdb) bt
#0  sm_fifo_read () at btl_sm.h:267
#1  mca_btl_sm_component_progress () at btl_sm_component.c:391
#2  0x2b00085116ea in opal_progress () at
runtime/opal_progress.c:207
#3  0x2b0007def215 in opal_condition_wait (count=2,
requests=0x7fffd27802a0, statuses=0x7fffd2780270)
at ../opal/threads/condition.h:99
#4  ompi_request_default_wait_all (count=2, requests=0x7fffd27802a0,
statuses=0x7fffd2780270) at request/req_wait.c:262
#5  0x2b0007e805b7 in ompi_coll_tuned_sendrecv_actual
(sendbuf=0x2aaac2c4c210, scount=28000, 
sdatatype=0x2b0008198ea0, dest=6, stag=-13, recvbuf=out>, rcount=28000, rdatatype=0x2b0008198ea0, 
source=10, rtag=-13, comm=0x16ad7420, status=0x0) at

coll_tuned_util.c:55
#6  0x2b0007e8705f in ompi_coll_tuned_sendrecv (sbuf=0x2aaac2b04010,
scount=28000, sdtype=0x2b0008198ea0, 
rbuf=0x2aaac99a2010, rcount=28000, rdtype=0x2b0008198ea0,

comm=0x16ad7420, module=0x16ad8450)
at coll_tuned_util.h:60
#7  ompi_coll_tuned_alltoall_intra_pairwise (sbuf=0x2aaac2b04010,
scount=28000, sdtype=0x2b

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

2010-07-28 Thread Terry Dontje

Here are a couple other suggestions:

1.  Have you tried your code with using the TCP btl just to make sure 
this might not be a general algorithm issue with the collective?


2.  While using the openib btl you may want to try things with rdma 
turned off by using the following parameters to mpirun:
-mca btl_openib_use_eager_rdma 0 -mca btl_openib_max_eager_rdma 0 -mca 
btl_openib_flags 1


3.  While using the openib btl you may want to try things while bumping 
up the rendezvous limit to see  if eliminating rendezvous messages helps 
things (others on the list is there an easier way to do this?).  Set the 
following parameters raising the 8192 and 2048 values:
-mca btl_openib_receive_queues "P,8192" -mca btl_openib_max_send_size 
8192 -mca btl_openib_eager_limit 8192 -mca btl_openib_rndv_eager_limit 2048


4.  You mayb also want to try and see if the basic collective algorithm 
work instead of using the tuned, which is the default I believe, by 
setting  "-mca coll_basic_priority 100".  The idea here is to determine 
if the tuned collective itself is tickling the issue.


--td

Terry Dontje wrote:
With this earlier failure do you know how many message may have been 
transferred between the two processes?  Is there a way to narrow this 
down to a small piece of code?  Do you have totalview or ddt at your 
disposal?


--td

Brian Smith wrote:

Also, the application I'm having trouble with appears to work fine with
MVAPICH2 1.4.1, if that is any help.

-Brian

On Tue, 2010-07-27 at 10:48 -0400, Terry Dontje wrote:
  

Can you try a simple point-to-point program?

--td

Brian Smith wrote: 


After running on two processors across two nodes, the problem occurs
much earlier during execution:

(gdb) bt
#0  opal_sys_timer_get_cycles ()
at ../opal/include/opal/sys/amd64/timer.h:46
#1  opal_timer_base_get_cycles ()
at ../opal/mca/timer/linux/timer_linux.h:31
#2  opal_progress () at runtime/opal_progress.c:181
#3  0x2b4bc3c00215 in opal_condition_wait (count=2,
requests=0x7fff33372480, statuses=0x7fff33372450)
at ../opal/threads/condition.h:99
#4  ompi_request_default_wait_all (count=2, requests=0x7fff33372480,
statuses=0x7fff33372450) at request/req_wait.c:262
#5  0x2b4bc3c915b7 in ompi_coll_tuned_sendrecv_actual
(sendbuf=0x2aaad11dfaf0, scount=117692, 
sdatatype=0x2b4bc3fa9ea0, dest=1, stag=-13, recvbuf=out>, rcount=117692, 
rdatatype=0x2b4bc3fa9ea0, source=1, rtag=-13, comm=0x12cd98c0,

status=0x0) at coll_tuned_util.c:55
#6  0x2b4bc3c982db in ompi_coll_tuned_sendrecv (sbuf=0x2aaad10f9d10,
scount=117692, sdtype=0x2b4bc3fa9ea0, 
rbuf=0x2aaae104d010, rcount=117692, rdtype=0x2b4bc3fa9ea0,

comm=0x12cd98c0, module=0x12cda340)
at coll_tuned_util.h:60
#7  ompi_coll_tuned_alltoall_intra_two_procs (sbuf=0x2aaad10f9d10,
scount=117692, sdtype=0x2b4bc3fa9ea0, 
rbuf=0x2aaae104d010, rcount=117692, rdtype=0x2b4bc3fa9ea0,

comm=0x12cd98c0, module=0x12cda340)
at coll_tuned_alltoall.c:432
#8  0x2b4bc3c1b71f in PMPI_Alltoall (sendbuf=0x2aaad10f9d10,
sendcount=117692, sendtype=0x2b4bc3fa9ea0, 
recvbuf=0x2aaae104d010, recvcount=117692, recvtype=0x2b4bc3fa9ea0,

comm=0x12cd98c0) at palltoall.c:84
#9  0x2b4bc399cc86 in mpi_alltoall_f (sendbuf=0x2aaad10f9d10 "Z\n
\271\356\023\254\271?", sendcount=0x7fff33372688, 
sendtype=, recvbuf=0x2aaae104d010 "",
recvcount=0x7fff3337268c, recvtype=0xb67490, 
comm=0x12d9d20, ierr=0x7fff33372690) at palltoall_f.c:76

#10 0x004613b8 in m_alltoall_z_ ()
#11 0x004ec55f in redis_pw_ ()
#12 0x005643d0 in choleski_mp_orthch_ ()
#13 0x0043fbba in MAIN__ ()
#14 0x0042f15c in main ()

On Tue, 2010-07-27 at 06:14 -0400, Terry Dontje wrote:
  
  

A clarification from your previous email, you had your code working
with OMPI 1.4.1 but an older version of OFED?  Then you upgraded to
OFED 1.4 and things stopped working?  Sounds like your current system
is set up with OMPI 1.4.2 and OFED 1.5.  Anyways, I am a little
confused as to when things might have actually broke.

My first guess would be something may be wrong with the OFED setup.
Have checked the status of your ib devices with ibv_devinfo?  Have you
ran any of the OFED rc tests like ibv_rc_pingpong?  


If the above seems ok have you tried to run a simpler OMPI test like
connectivity?  I would see if a simple np=2 run spanning across two
nodes fails?

What OS distribution and version you are running on?

--td
Brian Smith wrote: 



In case my previous e-mail is too vague for anyone to address, here's a
backtrace from my application.  This version, compiled with Intel
11.1.064 (OpenMPI 1.4.2 w/ gcc 4.4.2), hangs during MPI_Alltoall
instead.  Running on 16 CPUs, Opteron 2427, Mellanox Technologies
MT25418 w/ OFED 1.5

strace on all ranks repeatedly shows:
poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6,
events=POLLIN}, {fd=7, events=POLLIN}, {fd=10, events=POLLIN}, {fd=2

Re: [OMPI users] Hybrid OpenMPI / OpenMP run pins OpenMP threads to a single core

2010-07-29 Thread Terry Dontje

Ralph Castain wrote:

How are you running it when the threads are all on one core?

If you are specifying --bind-to-core, then of course all the threads will be on 
one core since we bind the process (not the thread). If you are specifying -mca 
mpi_paffinity_alone 1, then the same behavior results.

Generally, if you want to bind threads, the only way to do it is with a rank 
file. We -might- figure out a way to provide an interface for thread-level 
binding, but I'm not sure about that right now. As things stand, OMPI has no 
visibility into the fact that your app spawned threads.


  
Huh???  That's not completely correct.  If you have a multiple socket 
machine you could to -bind-to-socket -bysocket and spread the processes 
that way.  Also couldn't you use the -cpus-per-proc with -bind-to-core 
to get a process to bind to a non-socket amount of cpus?


This is all documented in the mpirun manpage.

That being said, I also am confused, like Ralph, as to why no options is 
causing your code bind.  Maybe add a --report-bindings to your mpirun 
line to see what OMPI thinks it is doing in this regard?


--td

--td

On Jul 28, 2010, at 5:47 PM, David Akin wrote:

  

All,
I'm trying to get the OpenMP portion of the code below to run
multicore on a couple of 8 core nodes.

Good news: multiple threads are being spawned on each node in the run.
Bad news: each of the threads only runs on a single core, leaving 7
cores basically idle.
Sorta good news: if I provide a rank file I get the threads running on
different cores within each node (PITA.

Here's the first lines of output.

/usr/mpi/gcc/openmpi-1.4-qlc/bin/mpirun -host c005,c006 -np 2 -rf
rank.file -x OMP_NUM_THREADS=4 hybrid4.gcc

Hello from thread 2 out of 4 from process 1 out of 2 on c006.local
another parallel region:   name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=2
Hello from thread 3 out of 4 from process 1 out of 2 on c006.local
another parallel region:   name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=3
Hello from thread 1 out of 4 from process 1 out of 2 on c006.local
another parallel region:   name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=1
Hello from thread 1 out of 4 from process 0 out of 2 on c005.local
another parallel region:   name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=1
Hello from thread 3 out of 4 from process 0 out of 2 on c005.local
Hello from thread 2 out of 4 from process 0 out of 2 on c005.local
another parallel region:   name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=3
another parallel region:   name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=2
Hello from thread 0 out of 4 from process 0 out of 2 on c005.local
another parallel region:   name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=0
Hello from thread 0 out of 4 from process 1 out of 2 on c006.local
another parallel region:   name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=0
another parallel region:   name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=3
another parallel region:   name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=2
another parallel region:   name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=0
another parallel region:   name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=3
another parallel region:   name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=3
another parallel region:   name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=2
another parallel region:   name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=0
another parallel region:   name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=1
.
.
.

Here's the simple code:
#include 
#include "mpi.h"
#include 

int main(int argc, char *argv[]) {
 int numprocs, rank, namelen;
 char processor_name[MPI_MAX_PROCESSOR_NAME];
 int iam = 0, np = 1;
 char name[MPI_MAX_PROCESSOR_NAME];   /* MPI_MAX_PROCESSOR_NAME ==
128 */
 int O_ID;/* OpenMP thread ID
*/
 int M_ID;/* MPI rank ID
*/
 int rtn_val;

 MPI_Init(&argc, &argv);
 MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
 MPI_Get_processor_name(processor_name, &namelen);

 #pragma omp parallel default(shared) private(iam, np,O_ID)
 {
   np = omp_get_num_threads();
   iam = omp_get_thread_num();
   printf("Hello from thread %d out of %d from process %d out of %d on %s\n",
  iam, np, rank, numprocs, processor_name);
   int i=0;
   int j=0;
   double counter=0;
   for(i =0;i<;i++)
   {
O_ID = omp_get_thread_num();  /* get OpenMP
thread ID */
MPI_Get_processor_name(name,&namelen);
rtn_val = MPI_Comm_rank(MPI_COMM_WORLD,&M_ID);
printf("another parallel region:   name:%s
MPI_RANK_ID=%d OMP_THREAD_ID=%d\n", name,M_ID,O_ID);
for(j = 0;j<9;j++)
 {
  counter=counter+i;
 }
   }

 }

 MPI_Finalize();

}
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/us

Re: [OMPI users] Hybrid OpenMPI / OpenMP run pins OpenMP threads to a single core

2010-07-29 Thread Terry Dontje

Ralph Castain wrote:


On Jul 29, 2010, at 5:09 AM, Terry Dontje wrote:


Ralph Castain wrote:

How are you running it when the threads are all on one core?

If you are specifying --bind-to-core, then of course all the threads will be on 
one core since we bind the process (not the thread). If you are specifying -mca 
mpi_paffinity_alone 1, then the same behavior results.

Generally, if you want to bind threads, the only way to do it is with a rank 
file. We -might- figure out a way to provide an interface for thread-level 
binding, but I'm not sure about that right now. As things stand, OMPI has no 
visibility into the fact that your app spawned threads.


  
Huh???  That's not completely correct.  If you have a multiple socket 
machine you could to -bind-to-socket -bysocket and spread the 
processes that way.  Also couldn't you use the -cpus-per-proc with 
-bind-to-core to get a process to bind to a non-socket amount of cpus?


Yes, you could do bind-to-socket, though that still constrains the 
threads to only that one socket. What was asked about here was the 
ability to bind-to-core at the thread level, and that is something 
OMPI doesn't support.


Sorry I did not get that constraint.  So to be clear what is being asked 
is to have the ability to bind a processes threads to specific cores.  
If so then to the letter of what that means I agree you cannot do that. 

However, what may be the next best thing is to specify binding of a 
process to a group of resources.  That's essentially what my suggestion 
above is doing. 

I do agree with Ralph that once you start overloading the socket with 
more threads then it can handle problems will ensue.


--td




This is all documented in the mpirun manpage.

That being said, I also am confused, like Ralph, as to why no options 
is causing your code bind.  Maybe add a --report-bindings to your 
mpirun line to see what OMPI thinks it is doing in this regard?


This is a good suggestion - I'm beginning to believe that the binding 
is happening in the user's app and not OMPI.





--td

--td

On Jul 28, 2010, at 5:47 PM, David Akin wrote:

  

All,
I'm trying to get the OpenMP portion of the code below to run
multicore on a couple of 8 core nodes.

Good news: multiple threads are being spawned on each node in the run.
Bad news: each of the threads only runs on a single core, leaving 7
cores basically idle.
Sorta good news: if I provide a rank file I get the threads running on
different cores within each node (PITA.

Here's the first lines of output.

/usr/mpi/gcc/openmpi-1.4-qlc/bin/mpirun -host c005,c006 -np 2 -rf
rank.file -x OMP_NUM_THREADS=4 hybrid4.gcc

Hello from thread 2 out of 4 from process 1 out of 2 on c006.local
another parallel region:   name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=2
Hello from thread 3 out of 4 from process 1 out of 2 on c006.local
another parallel region:   name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=3
Hello from thread 1 out of 4 from process 1 out of 2 on c006.local
another parallel region:   name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=1
Hello from thread 1 out of 4 from process 0 out of 2 on c005.local
another parallel region:   name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=1
Hello from thread 3 out of 4 from process 0 out of 2 on c005.local
Hello from thread 2 out of 4 from process 0 out of 2 on c005.local
another parallel region:   name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=3
another parallel region:   name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=2
Hello from thread 0 out of 4 from process 0 out of 2 on c005.local
another parallel region:   name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=0
Hello from thread 0 out of 4 from process 1 out of 2 on c006.local
another parallel region:   name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=0
another parallel region:   name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=3
another parallel region:   name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=2
another parallel region:   name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=0
another parallel region:   name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=3
another parallel region:   name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=3
another parallel region:   name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=2
another parallel region:   name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=0
another parallel region:   name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=1
.
.
.

Here's the simple code:
#include 
#include "mpi.h"
#include 

int main(int argc, char *argv[]) {
 int numprocs, rank, namelen;
 char processor_name[MPI_MAX_PROCESSOR_NAME];
 int iam = 0, np = 1;
 char name[MPI_MAX_PROCESSOR_NAME];   /* MPI_MAX_PROCESSOR_NAME ==
128 */
 int O_ID;/* OpenMP thread ID
*/
 int M_ID;/* MPI rank ID
*/
 int rtn_val;

 MPI_Init(&argc, &argv);
 MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
 MPI_

Re: [OMPI users] Hybrid OpenMPI / OpenMP run pins OpenMP threads to a single core

2010-07-29 Thread Terry Dontje

No problem, anyways I think you are headed in the right direction now.

--td
David Akin wrote:
Sorry for the confusion. What I need is for all OpenMP threads to 
*not* stay on one core. I *would* rather each OpenMP thread to run on 
a separate core. Is it my example code? My gut reaction is no because 
I can manipulate (somewhat) the cores the threads are assigned by 
adding -bysocket -bind-to-socket to mpirun.


On Thu, Jul 29, 2010 at 10:08 AM, Terry Dontje 
mailto:terry.don...@oracle.com>> wrote:


Ralph Castain wrote:


On Jul 29, 2010, at 5:09 AM, Terry Dontje wrote:


Ralph Castain wrote:

How are you running it when the threads are all on one core?

If you are specifying --bind-to-core, then of course all the threads will 
be on one core since we bind the process (not the thread). If you are 
specifying -mca mpi_paffinity_alone 1, then the same behavior results.

Generally, if you want to bind threads, the only way to do it is with a 
rank file. We -might- figure out a way to provide an interface for thread-level 
binding, but I'm not sure about that right now. As things stand, OMPI has no 
visibility into the fact that your app spawned threads.


  

Huh???  That's not completely correct.  If you have a multiple
socket machine you could to -bind-to-socket -bysocket and spread
the processes that way.  Also couldn't you use the
-cpus-per-proc with -bind-to-core to get a process to bind to a
non-socket amount of cpus?


Yes, you could do bind-to-socket, though that still constrains
the threads to only that one socket. What was asked about here
was the ability to bind-to-core at the thread level, and that is
something OMPI doesn't support.


Sorry I did not get that constraint.  So to be clear what is being
asked is to have the ability to bind a processes threads to
specific cores.  If so then to the letter of what that means I
agree you cannot do that. 


However, what may be the next best thing is to specify binding of
a process to a group of resources.  That's essentially what my
suggestion above is doing. 


I do agree with Ralph that once you start overloading the socket
with more threads then it can handle problems will ensue.

--td




This is all documented in the mpirun manpage.

That being said, I also am confused, like Ralph, as to why no
options is causing your code bind.  Maybe add a
--report-bindings to your mpirun line to see what OMPI thinks it
is doing in this regard?


This is a good suggestion - I'm beginning to believe that the
binding is happening in the user's app and not OMPI.




--td

--td

On Jul 28, 2010, at 5:47 PM, David Akin wrote:

  

All,
I'm trying to get the OpenMP portion of the code below to run
multicore on a couple of 8 core nodes.

Good news: multiple threads are being spawned on each node in the run.
Bad news: each of the threads only runs on a single core, leaving 7
cores basically idle.
Sorta good news: if I provide a rank file I get the threads running on
different cores within each node (PITA.

Here's the first lines of output.

/usr/mpi/gcc/openmpi-1.4-qlc/bin/mpirun -host c005,c006 -np 2 -rf
rank.file -x OMP_NUM_THREADS=4 hybrid4.gcc

Hello from thread 2 out of 4 from process 1 out of 2 on c006.local
another parallel region:   name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=2
Hello from thread 3 out of 4 from process 1 out of 2 on c006.local
another parallel region:   name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=3
Hello from thread 1 out of 4 from process 1 out of 2 on c006.local
another parallel region:   name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=1
Hello from thread 1 out of 4 from process 0 out of 2 on c005.local
another parallel region:   name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=1
Hello from thread 3 out of 4 from process 0 out of 2 on c005.local
Hello from thread 2 out of 4 from process 0 out of 2 on c005.local
another parallel region:   name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=3
another parallel region:   name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=2
Hello from thread 0 out of 4 from process 0 out of 2 on c005.local
another parallel region:   name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=0
Hello from thread 0 out of 4 from process 1 out of 2 on c006.local
another parallel region:   name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=0
another parallel region:   name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=3
another parallel region:   name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=2
another parallel region:   name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=0
another parallel region:   name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=3
another parallel region:   name:c005.local MPI_RA

Re: [OMPI users] OpenIB Error in ibv_create_srq

2010-08-02 Thread Terry Dontje
My guess is from the message below saying "(openib) BTL failed to 
initialize"  that the code is probably running over tcp.  To absolutely 
prove this you can specify to only use the openib, sm and self btls to 
eliminate the tcp btl.  To do that you add the following to the mpirun 
line "-mca btl openib,sm,self".  I believe with that specification the 
code will abort and not run to completion. 

What version of the OFED stack are you using?  I wonder if srq is 
supported on your system or not?


--td

Allen Barnett wrote:

Hi: A customer is attempting to run our OpenMPI 1.4.2-based application
on a cluster of machines running RHEL4 with the standard OFED stack. The
HCAs are identified as:

03:01.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge (rev a1)
04:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1)

ibv_devinfo says that one port on the HCAs is active but the other is
down:

hca_id: mthca0
fw_ver: 3.0.2
node_guid:  0006:6a00:9800:4c78
sys_image_guid: 0006:6a00:9800:4c78
vendor_id:  0x066a
vendor_part_id: 23108
hw_ver: 0xA1
phys_port_cnt:  2
port:   1
state:  active (4)
max_mtu:2048 (4)
active_mtu: 2048 (4)
sm_lid: 1
port_lid:   26
port_lmc:   0x00

port:   2
state:  down (1)
max_mtu:2048 (4)
active_mtu: 512 (2)
sm_lid: 0
port_lid:   0
port_lmc:   0x00


 When the OMPI application is run, it prints the error message:


The OpenFabrics (openib) BTL failed to initialize while trying to
create an internal queue.  This typically indicates a failed
OpenFabrics installation, faulty hardware, or that Open MPI is
attempting to use a feature that is not supported on your hardware
(i.e., is a shared receive queue specified in the
btl_openib_receive_queues MCA parameter with a device that does not
support it?).  The failure occured here:

  Local host:  machine001.lan
  OMPI
source: /software/openmpi-1.4.2/ompi/mca/btl/openib/btl_openib.c:250
  Function:ibv_create_srq()
  Error:   Invalid argument (errno=22)
  Device:  mthca0

You may need to consult with your system administrator to get this
problem fixed.


The full log of a run with "btl_openib_verbose 1" is attached. My
application appears to run to completion, but I can't tell if it's just
running on TCP and not using the IB hardware.

I would appreciate any suggestions on how to proceed to fix this error.

Thanks,
Allen

  



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.650.633.7054
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 



Re: [OMPI users] Accessing to the send buffer

2010-08-02 Thread Terry Dontje
I believe it is definitely a no-no to STORE (write) into a send buffer 
while a send is posted.  I know there have been debate in the forum to 
relax LOADS (reads) from a send buffer.  I think OMPI can handle the 
latter case (LOADS).  On the posted receive side you open yourself up 
for some race conditions and overwrites if you do STORES or LOADS from a 
posted receive buffer.


--td

Alberto Canestrelli wrote:

Hi,
I have a problem with a fortran code that I have parallelized with 
MPI. I state in advance that I read the whole ebook "Mit Press - Mpi - 
The Complete Reference, Volume 1" and I took different MPI classes, so 
I have a discrete MPI knowledge. I was able to solve by myself all the 
errors I bumped into but now I am not able to find the bug of my code 
that provides erroneous results. Without entering in the details of my 
code, I think that the cause of the problem could be reletad to the 
following aspect highlighted in the above ebook (in the follow I copy 
and paste from the e-book):


A nonblocking post-send call indicates that the system may start 
copying data
out of the send buffer. The sender must not access any part of the 
send buffer
(neither for loads nor for STORES) after a nonblocking send operation 
is posted until

the complete send returns.
A nonblocking post-receive indicates that the system may start writing 
data into
the receive buffer. The receiver must not access any part of the 
receive buffer after
a nonblocking receive operation is posted, until the complete-receive 
returns.
Rationale. We prohibit read accesses to a send buffer while it is 
being used, even
though the send operation is not supposed to alter the content of this 
buffer. This
may seem more stringent than necessary, but the additional restriction 
causes little
loss of functionality and allows better performance on some systems- 
consider
the case where data transfer is done by a DMA engine that is not 
cache-coherent

with the main processor.End of rationale.

I use plenty of nonblocking post-send in my code. Is it really true 
that the sender must not access any part of the send buffer not even 
for STORES?  Or was it a MPI 1.0 issue?

Thanks.
alberto
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.650.633.7054
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 



Re: [OMPI users] Accessing to the send buffer

2010-08-02 Thread Terry Dontje
In the posted irecv case if you are reading from the posted receive 
buffer the problem is you may get one of three values:


1.  pre irecv value
2.  value received from the irecv in progress
3.  possibly garbage if you are unlucky enough to access memory that is 
at the same time being updated. 


--td
Alberto Canestrelli wrote:

Thanks,
it was late in the night yesterday and i highlighted STORES but I 
meanted to highlight LOADS! I know that
stores are not allowed when you are doing non blocking send-recv. But 
I was impressed about LOADS case. I always do some loads of the data
between all my ISEND-IRECVs and my WAITs. Could  you please confirm me 
that OMPI can handle the LOAD case? And if it cannot handle it, which 
could be the consequence? What could happen in the worst of the case 
when there is a data race in reading a data?

thanks
alberto

Il 02/08/2010 9.32, Alberto Canestrelli ha scritto:

I believe it is definitely a no-no to STORE (write) into a send buffer
while a send is posted. I know there have been debate in the forum to
relax LOADS (reads) from a send buffer. I think OMPI can handle the
latter case (LOADS). On the posted receive side you open yourself up
for some race conditions and overwrites if you do STORES or LOADS from a
posted receive buffer.

--td

Alberto Canestrelli wrote:

 Hi,
 I have a problem with a fortran code that I have parallelized with
 MPI. I state in advance that I read the whole ebook "Mit Press - Mpi -
 The Complete Reference, Volume 1" and I took different MPI classes, so
 I have a discrete MPI knowledge. I was able to solve by myself all the
 errors I bumped into but now I am not able to find the bug of my code
 that provides erroneous results. Without entering in the details of my
 code, I think that the cause of the problem could be reletad to the
 following aspect highlighted in the above ebook (in the follow I copy
 and paste from the e-book):

 A nonblocking post-send call indicates that the system may start
 copying data
 out of the send buffer. The sender must not access any part of the
 send buffer
 (neither for loads nor for STORES) after a nonblocking send operation
 is posted until
 the complete send returns.
 A nonblocking post-receive indicates that the system may start writing
 data into
 the receive buffer. The receiver must not access any part of the
 receive buffer after
 a nonblocking receive operation is posted, until the complete-receive
 returns.
 Rationale. We prohibit read accesses to a send buffer while it is
 being used, even
 though the send operation is not supposed to alter the content of this
 buffer. This
 may seem more stringent than necessary, but the additional restriction
 causes little
 loss of functionality and allows better performance on some systems-
 consider
 the case where data transfer is done by a DMA engine that is not
 cache-coherent
 with the main processor.End of rationale.

 I use plenty of nonblocking post-send in my code. Is it really true
 that the sender must not access any part of the send buffer not even
 for STORES? Or was it a MPI 1.0 issue?
 Thanks.
 alberto
 ___
 users mailing list
 users_at_[hidden]
 http://www.open-mpi.org/mailman/listinfo.cgi/users







--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.650.633.7054
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 



Re: [OMPI users] Accessing to the send buffer

2010-08-02 Thread Terry Dontje
For OMPI I believe reading the data buffer given to a posted send will 
not cause any problems.


Anyone on the list care to disagree?

--td

Alberto Canestrelli wrote:

Thanks,
ok that is not my problem I never read a data from the posted receive 
before the correspondent WAIT. Now the last question is: what could 
happen if I  am reading the data from the posted send? I do it plenty 
of times! possible consequences?Can you guarantee me that this 
approach is safe?

thank you very much
Alberto

Il 02/08/2010 11.29, Alberto Canestrelli ha scritto:

In the posted irecv case if you are reading from the posted receive
buffer the problem is you may get one of three values:

1. pre irecv value
2. value received from the irecv in progress
3. possibly garbage if you are unlucky enough to access memory that is
at the same time being updated.

--td
Alberto Canestrelli wrote:

 Thanks,
 it was late in the night yesterday and i highlighted STORES but I
 meanted to highlight LOADS! I know that
 stores are not allowed when you are doing non blocking send-recv. But
 I was impressed about LOADS case. I always do some loads of the data
 between all my ISEND-IRECVs and my WAITs. Could you please confirm me
 that OMPI can handle the LOAD case? And if it cannot handle it, which
 could be the consequence? What could happen in the worst of the case
 when there is a data race in reading a data?
 thanks
 alberto

 Il 02/08/2010 9.32, Alberto Canestrelli ha scritto:
> I believe it is definitely a no-no to STORE (write) into a send 
buffer

> while a send is posted. I know there have been debate in the forum to
> relax LOADS (reads) from a send buffer. I think OMPI can handle the
> latter case (LOADS). On the posted receive side you open yourself up
> for some race conditions and overwrites if you do STORES or LOADS 
from a

> posted receive buffer.
>
> --td
>
> Alberto Canestrelli wrote:
>> Hi,
>> I have a problem with a fortran code that I have parallelized with
>> MPI. I state in advance that I read the whole ebook "Mit Press - 
Mpi -
>> The Complete Reference, Volume 1" and I took different MPI 
classes, so
>> I have a discrete MPI knowledge. I was able to solve by myself 
all the
>> errors I bumped into but now I am not able to find the bug of my 
code
>> that provides erroneous results. Without entering in the details 
of my

>> code, I think that the cause of the problem could be reletad to the
>> following aspect highlighted in the above ebook (in the follow I 
copy

>> and paste from the e-book):
>>
>> A nonblocking post-send call indicates that the system may start
>> copying data
>> out of the send buffer. The sender must not access any part of the
>> send buffer
>> (neither for loads nor for STORES) after a nonblocking send 
operation

>> is posted until
>> the complete send returns.
>> A nonblocking post-receive indicates that the system may start 
writing

>> data into
>> the receive buffer. The receiver must not access any part of the
>> receive buffer after
>> a nonblocking receive operation is posted, until the 
complete-receive

>> returns.
>> Rationale. We prohibit read accesses to a send buffer while it is
>> being used, even
>> though the send operation is not supposed to alter the content of 
this

>> buffer. This
>> may seem more stringent than necessary, but the additional 
restriction

>> causes little
>> loss of functionality and allows better performance on some systems-
>> consider
>> the case where data transfer is done by a DMA engine that is not
>> cache-coherent
>> with the main processor.End of rationale.
>>
>> I use plenty of nonblocking post-send in my code. Is it really true
>> that the sender must not access any part of the send buffer not even
>> for STORES? Or was it a MPI 1.0 issue?
>> Thanks.
>> alberto







--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.650.633.7054
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 



Re: [OMPI users] OpenIB Error in ibv_create_srq

2010-08-03 Thread Terry Dontje
Sorry, I didn't see your prior question glad you found the 
btl_openib_receive_queues parameter.  There is not a faq entry for this 
but I found the following in the openib btl help file that spells out 
the parameters when using Per-peer receive queue (ie receive queue 
setting with "P" as the first argument).


Per-peer receive queues require between 2 and 5 parameters:

1. Buffer size in bytes (mandatory)
2. Number of buffers (mandatory)
3. Low buffer count watermark (optional; defaults to (num_buffers / 2))
4. Credit window size (optional; defaults to (low_watermark / 2))
5. Number of buffers reserved for credit messages (optional;
defaults to (num_buffers*2-1)/credit_window)

Example: P,128,256,128,16
 - 128 byte buffers
 - 256 buffers to receive incoming MPI messages
 - When the number of available buffers reaches 128, re-post 128 more
   buffers to reach a total of 256
 - If the number of available credits reaches 16, send an explicit
   credit message to the sender
 - Defaulting to ((256 * 2) - 1) / 16 = 31; this many buffers are
   reserved for explicit credit messages

--td
Allen Barnett wrote:

Hi: In response to my own question, by studying the file
mca-btl-openib-device-params.ini, I discovered that this option in
OMPI-1.4.2:

-mca btl_openib_receive_queues P,65536,256,192,128

was sufficient to prevent OMPI from trying to create shared receive
queues and allowed my application to run to completion using the IB
hardware.

I guess my question now is: What do these numbers mean? Presumably the
size (or counts?) of buffers to allocate? Are there limits or a way to
tune these values?

Thanks,
Allen

On Mon, 2010-08-02 at 12:49 -0400, Allen Barnett wrote:
  

Hi Terry:
It is indeed the case that the openib BTL has not been initialized. I
ran with your tcp-disabled MCA option and it aborted in MPI_Init.

The OFED stack is what's included in RHEL4. It appears to be made up of
the RPMs:
openib-1.4-1.el4
opensm-3.2.5-1.el4
libibverbs-1.1.2-1.el4

How can I determine if srq is supported? Is there an MCA option to
defeat it? (Our in-house cluster has more recent Mellanox IB hardware
and is running this same IB stack and ompi 1.4.2 works OK, so I suspect
srq is supported by the OpenFabrics stack. Perhaps.)

Thanks,
Allen

On Mon, 2010-08-02 at 06:47 -0400, Terry Dontje wrote:


My guess is from the message below saying "(openib) BTL failed to
initialize"  that the code is probably running over tcp.  To
absolutely prove this you can specify to only use the openib, sm and
self btls to eliminate the tcp btl.  To do that you add the following
to the mpirun line "-mca btl openib,sm,self".  I believe with that
specification the code will abort and not run to completion.  


What version of the OFED stack are you using?  I wonder if srq is
supported on your system or not?

--td

Allen Barnett wrote: 
  

Hi: A customer is attempting to run our OpenMPI 1.4.2-based application
on a cluster of machines running RHEL4 with the standard OFED stack. The
HCAs are identified as:

03:01.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge (rev a1)
04:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1)

ibv_devinfo says that one port on the HCAs is active but the other is
down:

hca_id: mthca0
fw_ver: 3.0.2
node_guid:  0006:6a00:9800:4c78
sys_image_guid: 0006:6a00:9800:4c78
vendor_id:  0x066a
vendor_part_id: 23108
hw_ver: 0xA1
phys_port_cnt:  2
port:   1
state:  active (4)
max_mtu:2048 (4)
active_mtu: 2048 (4)
sm_lid: 1
port_lid:   26
port_lmc:   0x00

port:   2
state:  down (1)
max_mtu:2048 (4)
active_mtu: 512 (2)
sm_lid: 0
port_lid:   0
port_lmc:   0x00


 When the OMPI application is run, it prints the error message:


The OpenFabrics (openib) BTL failed to initialize while trying to
create an internal queue.  This typically indicates a failed
OpenFabrics installation, faulty hardware, or that Open MPI is
attempting to use a feature that is not supported on your hardware
(i.e., is a shared receive queue specified in the
btl_openib_receive_queues MCA parameter with a device that does not
support it?).  The failure occured here:

  Local host:  machine001.lan
  OMPI
source: /software/openmpi-1.4.2/ompi/mc

Re: [OMPI users] OpenIB Error in ibv_create_srq

2010-08-04 Thread Terry Dontje

Allen Barnett wrote:

Thanks for the pointer!

Do you know if these sizes are dependent on the hardware?
  

They can be, the following file sets up the defaults for some known cards:

ompi/mca/btl/openib/mca-btl-openib-device-params.ini

--td

Thanks,
Allen

On Tue, 2010-08-03 at 10:29 -0400, Terry Dontje wrote:
  

Sorry, I didn't see your prior question glad you found the
btl_openib_receive_queues parameter.  There is not a faq entry for
this but I found the following in the openib btl help file that spells
out the parameters when using Per-peer receive queue (ie receive queue
setting with "P" as the first argument).

Per-peer receive queues require between 2 and 5 parameters:

 1. Buffer size in bytes (mandatory)
 2. Number of buffers (mandatory)
 3. Low buffer count watermark (optional; defaults to (num_buffers /
2))
 4. Credit window size (optional; defaults to (low_watermark / 2))
 5. Number of buffers reserved for credit messages (optional;
 defaults to (num_buffers*2-1)/credit_window)

 Example: P,128,256,128,16
  - 128 byte buffers
  - 256 buffers to receive incoming MPI messages
  - When the number of available buffers reaches 128, re-post 128 more
buffers to reach a total of 256
  - If the number of available credits reaches 16, send an explicit
credit message to the sender
  - Defaulting to ((256 * 2) - 1) / 16 = 31; this many buffers are
reserved for explicit credit messages

--td
Allen Barnett wrote: 


Hi: In response to my own question, by studying the file
mca-btl-openib-device-params.ini, I discovered that this option in
OMPI-1.4.2:

-mca btl_openib_receive_queues P,65536,256,192,128

was sufficient to prevent OMPI from trying to create shared receive
queues and allowed my application to run to completion using the IB
hardware.

I guess my question now is: What do these numbers mean? Presumably the
size (or counts?) of buffers to allocate? Are there limits or a way to
tune these values?

Thanks,
Allen

On Mon, 2010-08-02 at 12:49 -0400, Allen Barnett wrote:
  
  

Hi Terry:
It is indeed the case that the openib BTL has not been initialized. I
ran with your tcp-disabled MCA option and it aborted in MPI_Init.

The OFED stack is what's included in RHEL4. It appears to be made up of
the RPMs:
openib-1.4-1.el4
opensm-3.2.5-1.el4
libibverbs-1.1.2-1.el4

How can I determine if srq is supported? Is there an MCA option to
defeat it? (Our in-house cluster has more recent Mellanox IB hardware
and is running this same IB stack and ompi 1.4.2 works OK, so I suspect
srq is supported by the OpenFabrics stack. Perhaps.)

Thanks,
Allen

On Mon, 2010-08-02 at 06:47 -0400, Terry Dontje wrote:



My guess is from the message below saying "(openib) BTL failed to
initialize"  that the code is probably running over tcp.  To
absolutely prove this you can specify to only use the openib, sm and
self btls to eliminate the tcp btl.  To do that you add the following
to the mpirun line "-mca btl openib,sm,self".  I believe with that
specification the code will abort and not run to completion.  


What version of the OFED stack are you using?  I wonder if srq is
supported on your system or not?

--td

Allen Barnett wrote: 
  
  

Hi: A customer is attempting to run our OpenMPI 1.4.2-based application
on a cluster of machines running RHEL4 with the standard OFED stack. The
HCAs are identified as:

03:01.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge (rev a1)
04:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1)

ibv_devinfo says that one port on the HCAs is active but the other is
down:

hca_id: mthca0
fw_ver: 3.0.2
node_guid:  0006:6a00:9800:4c78
sys_image_guid: 0006:6a00:9800:4c78
vendor_id:  0x066a
vendor_part_id: 23108
hw_ver: 0xA1
phys_port_cnt:  2
port:   1
state:  active (4)
max_mtu:2048 (4)
active_mtu: 2048 (4)
sm_lid: 1
port_lid:   26
port_lmc:   0x00

port:   2
state:  down (1)
max_mtu:2048 (4)
active_mtu: 512 (2)
sm_lid: 0
port_lid:   0
port_lmc:   0x00


 When the OMPI application is run, it prints the error message:


The OpenFabrics (openib) BTL failed to initialize while trying to
create an internal queue.  This typically indicates a failed
OpenFabrics ins

Re: [OMPI users] problem with .bashrc stetting of openmpi

2010-08-13 Thread Terry Dontje

sun...@chem.iitb.ac.in wrote:

Dear Open-mpi users,

I installed openmpi-1.4.1 in my user area and then set the path for
openmpi in the .bashrc file as follow. However, am still getting following
error message whenever am starting the parallel molecular dynamics
simulation using GROMACS. So every time am starting the MD job, I need to
source the .bashrc file again.

Earlier in some other machine I did the same thing and was not getting any
problem.

Could you guys suggest what would be the problem?

  

Have you set OPAL_PREFIX to /home/sunitap/soft/openmpi?

If you do a ldd on mdrun_mpi does libmpi.so.0 come up not found?
If so and there truly is a libmpi.so.0 in /home/sunitap/soft/openmpi/lib
you may want to make sure the bitness of libmpi.so.0 and mdrun_mpi are 
the same by

doing a file command on both.

--td

.bashrc
#path for openmpi
export PATH=$PATH:/home/sunitap/soft/openmpi/bin
export CFLAGS="-I/home/sunitap/soft/openmpi/include"
export LDFLAGS="-L/home/sunitap/soft/openmpi/lib"
export LD_LIBRARY_PATH=/home/sunitap/soft/openmpi/lib:$LD_LIBRARY_PATH

== error message ==
mdrun_mpi: error while loading shared libraries: libmpi.so.0: cannot open
shared object file: No such file or directory



Thanks for any help.
Best regards,
Sunita

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
  



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.650.633.7054
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 



Re: [OMPI users] is there a way to bring to light _all_ configure options in a ready installation?

2010-08-24 Thread Terry Dontje

Jeff Squyres wrote:

You should be able to run "./configure --help" and see a lengthy help message 
that includes all the command line options to configure.

Is that what you're looking for?

  

No, he wants to know what configure options were used with some binaries.

--td

On Aug 24, 2010, at 7:40 AM, Paul Kapinos wrote:

  

Hello OpenMPI developers,

I am searching for a way to discover _all_ configure options of an OpenMPI 
installation.

Background: in a existing installation, the ompi_info program helps to find out a lot of 
informations about the installation. So, "ompi_info -c" shows *some* 
configuration options like CFLAGS, FFLAGS et cetera. Compilation directories often does 
not survive for long time (or are not shipped at all, e.g. with SunMPI)

But what about --enable-mpi-threads or --enable-contrib-no-build=vt for example (and all 
other possible) flags of "configure", how can I see would these flags set or 
would not?

In other words: is it possible to get _all_ flags of configure from an "ready" 
installation in without having the compilation dirs (with configure logs) any more?

Many thanks

Paul


--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




  



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 



Re: [OMPI users] [openib] segfault when using openib btl

2010-09-23 Thread Terry Dontje
Eloi, I am curious about your problem.  Can you tell me what size of job 
it is?  Does it always fail on the same bcast,  or same process?

Eloi Gaudry wrote:

Hi Nysal,

Thanks for your suggestions.

I'm now able to get the checksum computed and redirected to stdout, thanks (I forgot the  
"-mca pml_base_verbose 5" option, you were right).
I haven't been able to observe the segmentation fault (with hdr->tag=0) so far 
(when using pml csum) but I 'll let you know when I am.

I've got two others question, which may be related to the error observed:

1/ does the maximum number of MPI_Comm that can be handled by OpenMPI somehow depends on the btl being used (i.e. if I'm using 
openib, may I use the same number of MPI_Comm object as with tcp) ? Is there something as MPI_COMM_MAX in OpenMPI ?


2/ the segfaults only appears during a mpi collective call, with very small 
message (one int is being broadcast, for instance) ; i followed the guidelines 
given at http://icl.cs.utk.edu/open-
mpi/faq/?category=openfabrics#ib-small-message-rdma but the debug-build of OpenMPI asserts if I use a different min-size that 255. Anyway, if I deactivate eager_rdma, the segfaults remains. 
Does the openib btl handle very small message differently (even with eager_rdma deactivated) than tcp ?
Others on the list does coalescing happen with non-eager_rdma?  If so 
then that would possibly be one difference between the openib btl and 
tcp aside from the actual protocol used.

 is there a way to make sure that large messages and small messages are handled 
the same way ?
  
Do you mean so they all look like eager messages?  How large of messages 
are we talking about here 1K, 1M or 10M?


--td

Regards,
Eloi


On Friday 17 September 2010 17:57:17 Nysal Jan wrote:
  

Hi Eloi,
Create a debug build of OpenMPI (--enable-debug) and while running with the
csum PML add "-mca pml_base_verbose 5" to the command line. This will print
the checksum details for each fragment sent over the wire. I'm guessing it
didnt catch anything because the BTL failed. The checksum verification is
done in the PML, which the BTL calls via a callback function. In your case
the PML callback is never called because the hdr->tag is invalid. So
enabling checksum tracing also might not be of much use. Is it the first
Bcast that fails or the nth Bcast and what is the message size? I'm not
sure what could be the problem at this moment. I'm afraid you will have to
debug the BTL to find out more.

--Nysal

On Fri, Sep 17, 2010 at 4:39 PM, Eloi Gaudry  wrote:


Hi Nysal,

thanks for your response.

I've been unable so far to write a test case that could illustrate the
hdr->tag=0 error.
Actually, I'm only observing this issue when running an internode
computation involving infiniband hardware from Mellanox (MT25418,
ConnectX IB DDR, PCIe 2.0
2.5GT/s, rev a0) with our time-domain software.

I checked, double-checked, and rechecked again every MPI use performed
during a parallel computation and I couldn't find any error so far. The
fact that the very
same parallel computation run flawlessly when using tcp (and disabling
openib support) might seem to indicate that the issue is somewhere
located inside the
openib btl or at the hardware/driver level.

I've just used the "-mca pml csum" option and I haven't seen any related
messages (when hdr->tag=0 and the segfaults occurs).
Any suggestion ?

Regards,
Eloi

On Friday 17 September 2010 16:03:34 Nysal Jan wrote:
  

Hi Eloi,
Sorry for the delay in response. I haven't read the entire email
thread, but do you have a test case which can reproduce this error?
Without that it will be difficult to nail down the cause. Just to
clarify, I do not work for an iwarp vendor. I can certainly try to
reproduce it on an IB system. There is also a PML called csum, you can
use it via "-mca pml csum", which will checksum the MPI messages and
verify it at the receiver side for any data corruption. You can try
using it to see if it is able


to

  

catch anything.

Regards
--Nysal

On Thu, Sep 16, 2010 at 3:48 PM, Eloi Gaudry  wrote:


Hi Nysal,

I'm sorry to intrrupt, but I was wondering if you had a chance to
look
  

at

  

this error.

Regards,
Eloi



--


Eloi Gaudry

Free Field Technologies
Company Website: http://www.fft.be
Company Phone:   +32 10 487 959


-- Forwarded message --
From: Eloi Gaudry 
To: Open MPI Users 
Date: Wed, 15 Sep 2010 16:27:43 +0200
Subject: Re: [OMPI users] [openib] segfault when using openib btl
Hi,

I was wondering if anybody got a chance to have a look at this issue.

Regards,
Eloi

On Wednesday 18 August 2010 09:16:26 Eloi Gaudry wrote:
  

Hi Jeff,

Please find enclosed the output (valgrind.out.gz) from
/opt/openmpi-debug-1.4.2/bin/orterun -np 2 --host pbn11,pbn10 --mca


btl

  

openib,self --display-map --verbose --mca mpi_warn_on_fork 0 --mca
btl_openib_want_fork_support 0 -tag-output
/opt/valgrind-3.5.0/bin/valgrind --tool=memc

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-24 Thread Terry Dontje
That is interesting.  So does the number of processes affect your runs 
any.  The times I've seen hdr->tag be 0 usually has been due to protocol 
issues.  The tag should never be 0.  Have you tried to do other 
receive_queue settings other than the default and the one you mention.


I wonder if you did a combination of the two receive queues causes a 
failure or not.  Something like


P,128,256,192,128:P,65536,256,192,128

I am wondering if it is the first queuing definition causing the issue or 
possibly the SRQ defined in the default.

--td

Eloi Gaudry wrote:

Hi Terry,

The messages being send/received can be of any size, but the error seems to 
happen more often with small messages (as an int being broadcasted or 
allreduced).
The failing communication differs from one run to another, but some spots are more likely to be failing than another. And as far as I know, there are always located next to a small message (an int 
being broadcasted for instance) communication. Other typical messages size are >10k but can be very much larger.


I've been checking the hca being used, its' from mellanox (with 
vendor_part_id=26428). There is no receive_queues parameters associated to it.
 $ cat share/openmpi/mca-btl-openib-device-params.ini as well:
[...]
  # A.k.a. ConnectX
  [Mellanox Hermon]
  vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3
  vendor_part_id = 
25408,25418,25428,26418,26428,25448,26438,26448,26468,26478,26488
  use_eager_rdma = 1
  mtu = 2048
  max_inline_data = 128
[..]

$ ompi_info --param btl openib --parsable | grep receive_queues
 
mca:btl:openib:param:btl_openib_receive_queues:value:P,128,256,192,128:S,2048,256,128,32:S,12288,256,128,32:S,65536,256,128,32
 mca:btl:openib:param:btl_openib_receive_queues:data_source:default value
 mca:btl:openib:param:btl_openib_receive_queues:status:writable
 mca:btl:openib:param:btl_openib_receive_queues:help:Colon-delimited, comma 
delimited list of receive queues: P,4096,8,6,4:P,32768,8,6,4
 mca:btl:openib:param:btl_openib_receive_queues:deprecated:no

I was wondering if these parameters (automatically computed at openib btl init for what I understood) were not incorrect in some way and I plugged some others values: "P,65536,256,192,128" (someone on 
the list used that values when encountering a different issue) . Since that, I haven't been able to observe the segfault (occuring as hrd->tag = 0 in btl_openib_component.c:2881) yet.


Eloi


/home/pp_fr/st03230/EG/Softs/openmpi-custom-1.4.2/bin/

On Thursday 23 September 2010 23:33:48 Terry Dontje wrote:
  

Eloi, I am curious about your problem.  Can you tell me what size of job
it is?  Does it always fail on the same bcast,  or same process?

Eloi Gaudry wrote:


Hi Nysal,

Thanks for your suggestions.

I'm now able to get the checksum computed and redirected to stdout,
thanks (I forgot the  "-mca pml_base_verbose 5" option, you were right).
I haven't been able to observe the segmentation fault (with hdr->tag=0)
so far (when using pml csum) but I 'll let you know when I am.

I've got two others question, which may be related to the error observed:

1/ does the maximum number of MPI_Comm that can be handled by OpenMPI
somehow depends on the btl being used (i.e. if I'm using openib, may I
use the same number of MPI_Comm object as with tcp) ? Is there something
as MPI_COMM_MAX in OpenMPI ?

2/ the segfaults only appears during a mpi collective call, with very
small message (one int is being broadcast, for instance) ; i followed
the guidelines given at http://icl.cs.utk.edu/open-
mpi/faq/?category=openfabrics#ib-small-message-rdma but the debug-build
of OpenMPI asserts if I use a different min-size that 255. Anyway, if I
deactivate eager_rdma, the segfaults remains. Does the openib btl handle
very small message differently (even with eager_rdma deactivated) than
tcp ?
  

Others on the list does coalescing happen with non-eager_rdma?  If so
then that would possibly be one difference between the openib btl and
tcp aside from the actual protocol used.



 is there a way to make sure that large messages and small messages are
 handled the same way ?
  

Do you mean so they all look like eager messages?  How large of messages
are we talking about here 1K, 1M or 10M?

--td



Regards,
Eloi

On Friday 17 September 2010 17:57:17 Nysal Jan wrote:
  

Hi Eloi,
Create a debug build of OpenMPI (--enable-debug) and while running with
the csum PML add "-mca pml_base_verbose 5" to the command line. This
will print the checksum details for each fragment sent over the wire.
I'm guessing it didnt catch anything because the BTL failed. The
checksum verification is done in the PML, which the BTL calls via a
callback function. In your case the PML callback is never called
because the hdr->tag is invalid. So enabling checksum tracing also
might not be of much use. Is it the first Bcast that fails or th

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-24 Thread Terry Dontje

Eloi Gaudry wrote:

Terry,

You were right, the error indeed seems to come from the message coalescing 
feature.
If I turn it off using the "--mca btl_openib_use_message_coalescing 0", I'm not able to 
observe the "hdr->tag=0" error.

There are some trac requests associated to very similar error (https://svn.open-mpi.org/trac/ompi/search?q=coalescing) but they are all closed (except https://svn.open-mpi.org/trac/ompi/ticket/2352 
that might be related), aren't they ? What would you suggest Terry ?


  
Interesting, though it looks to me like the segv in ticket 2352 would 
have happened on the send side instead of the receive side like you 
have.  As to what to do next it would be really nice to have some sort 
of reproducer that we can try and debug what is really going on.  The 
only other thing to do without a reproducer is to inspect the code on 
the send side to figure out what might make it generate at 0 hdr->tag.  
Or maybe instrument the send side to stop when it is about ready to send 
a 0 hdr->tag and see if we can see how the code got there.


I might have some cycles to look at this Monday.

--td

Eloi


On Friday 24 September 2010 16:00:26 Terry Dontje wrote:
  

Eloi Gaudry wrote:


Terry,

No, I haven't tried any other values than P,65536,256,192,128 yet.

The reason why is quite simple. I've been reading and reading again this
thread to understand the btl_openib_receive_queues meaning and I can't
figure out why the default values seem to induce the hdr-

  

tag=0 issue
(http://www.open-mpi.org/community/lists/users/2009/01/7808.php).


Yeah, the size of the fragments and number of them really should not
cause this issue.  So I too am a little perplexed about it.



Do you think that the default shared received queue parameters are
erroneous for this specific Mellanox card ? Any help on finding the
proper parameters would actually be much appreciated.
  

I don't necessarily think it is the queue size for a specific card but
more so the handling of the queues by the BTL when using certain sizes.
At least that is one gut feel I have.

In my mind the tag being 0 is either something below OMPI is polluting
the data fragment or OMPI's internal protocol is some how getting messed
up.  I can imagine (no empirical data here) the queue sizes could change
how the OMPI protocol sets things up.  Another thing may be the
coalescing feature in the openib BTL which tries to gang multiple
messages into one packet when resources are running low.   I can see
where changing the queue sizes might affect the coalescing.  So, it
might be interesting to turn off the coalescing.  You can do that by
setting "--mca btl_openib_use_message_coalescing 0" in your mpirun line.

If that doesn't solve the issue then obviously there must be something
else going on :-).

Note, the reason I am interested in this is I am seeing a similar error
condition (hdr->tag == 0) on a development system.  Though my failing
case fails with np=8 using the connectivity test program which is mainly
point to point and there are not a significant amount of data transfers
going on either.

--td



Eloi

On Friday 24 September 2010 14:27:07 you wrote:
  

That is interesting.  So does the number of processes affect your runs
any.  The times I've seen hdr->tag be 0 usually has been due to protocol
issues.  The tag should never be 0.  Have you tried to do other
receive_queue settings other than the default and the one you mention.

I wonder if you did a combination of the two receive queues causes a
failure or not.  Something like

P,128,256,192,128:P,65536,256,192,128

I am wondering if it is the first queuing definition causing the issue
or possibly the SRQ defined in the default.

--td

Eloi Gaudry wrote:


Hi Terry,

The messages being send/received can be of any size, but the error
seems to happen more often with small messages (as an int being
broadcasted or allreduced). The failing communication differs from one
run to another, but some spots are more likely to be failing than
another. And as far as I know, there are always located next to a
small message (an int being broadcasted for instance) communication.
Other typical messages size are

  

10k but can be very much larger.


I've been checking the hca being used, its' from mellanox (with
vendor_part_id=26428). There is no receive_queues parameters associated
to it.

 $ cat share/openmpi/mca-btl-openib-device-params.ini as well:
[...]

  # A.k.a. ConnectX
  [Mellanox Hermon]
  vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3
  vendor_part_id =
  25408,25418,25428,26418,26428,25448,26438,26448,26468,26478,26488
  use_eager_rdma = 1
  mtu = 2048
  max_inline_data = 128

[..]

$ ompi_info --param btl openib --parsable | grep receive_queues

 mca:btl:openib:param:btl_open

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-27 Thread Terry Dontje
So it sounds like coalescing is not your issue and that the problem has 
something to do with the queue sizes.  It would be helpful if we could 
detect the hdr->tag == 0 issue on the sending side and get at least a 
stack trace.  There is something really odd going on here.


--td

Eloi Gaudry wrote:

Hi Terry,

I'm sorry to say that I might have missed a point here.

I've lately been relaunching all previously failing computations with 
the message coalescing feature being switched off, and I saw the same 
hdr->tag=0 error several times, always during a collective call 
(MPI_Comm_create, MPI_Allreduce and MPI_Broadcast, so far). And as 
soon as I switched to the peer queue option I was previously using 
(--mca btl_openib_receive_queues P,65536,256,192,128 instead of using 
--mca btl_openib_use_message_coalescing 0), all computations ran 
flawlessly.


As for the reproducer, I've already tried to write something but I 
haven't succeeded so far at reproducing the hdr->tag=0 issue with it.


Eloi

On 24/09/2010 18:37, Terry Dontje wrote:

Eloi Gaudry wrote:

Terry,

You were right, the error indeed seems to come from the message coalescing 
feature.
If I turn it off using the "--mca btl_openib_use_message_coalescing 0", I'm not able to 
observe the "hdr->tag=0" error.

There are some trac requests associated to very similar error (https://svn.open-mpi.org/trac/ompi/search?q=coalescing) but they are all closed (except https://svn.open-mpi.org/trac/ompi/ticket/2352 
that might be related), aren't they ? What would you suggest Terry ?


  
Interesting, though it looks to me like the segv in ticket 2352 would 
have happened on the send side instead of the receive side like you 
have.  As to what to do next it would be really nice to have some 
sort of reproducer that we can try and debug what is really going 
on.  The only other thing to do without a reproducer is to inspect 
the code on the send side to figure out what might make it generate 
at 0 hdr->tag.  Or maybe instrument the send side to stop when it is 
about ready to send a 0 hdr->tag and see if we can see how the code 
got there.


I might have some cycles to look at this Monday.

--td

Eloi


On Friday 24 September 2010 16:00:26 Terry Dontje wrote:
  

Eloi Gaudry wrote:


Terry,

No, I haven't tried any other values than P,65536,256,192,128 yet.

The reason why is quite simple. I've been reading and reading again this
thread to understand the btl_openib_receive_queues meaning and I can't
figure out why the default values seem to induce the hdr-

  

tag=0 issue
(http://www.open-mpi.org/community/lists/users/2009/01/7808.php).


Yeah, the size of the fragments and number of them really should not
cause this issue.  So I too am a little perplexed about it.



Do you think that the default shared received queue parameters are
erroneous for this specific Mellanox card ? Any help on finding the
proper parameters would actually be much appreciated.
  

I don't necessarily think it is the queue size for a specific card but
more so the handling of the queues by the BTL when using certain sizes.
At least that is one gut feel I have.

In my mind the tag being 0 is either something below OMPI is polluting
the data fragment or OMPI's internal protocol is some how getting messed
up.  I can imagine (no empirical data here) the queue sizes could change
how the OMPI protocol sets things up.  Another thing may be the
coalescing feature in the openib BTL which tries to gang multiple
messages into one packet when resources are running low.   I can see
where changing the queue sizes might affect the coalescing.  So, it
might be interesting to turn off the coalescing.  You can do that by
setting "--mca btl_openib_use_message_coalescing 0" in your mpirun line.

If that doesn't solve the issue then obviously there must be something
else going on :-).

Note, the reason I am interested in this is I am seeing a similar error
condition (hdr->tag == 0) on a development system.  Though my failing
case fails with np=8 using the connectivity test program which is mainly
point to point and there are not a significant amount of data transfers
going on either.

--td



Eloi

On Friday 24 September 2010 14:27:07 you wrote:
  

That is interesting.  So does the number of processes affect your runs
any.  The times I've seen hdr->tag be 0 usually has been due to protocol
issues.  The tag should never be 0.  Have you tried to do other
receive_queue settings other than the default and the one you mention.

I wonder if you did a combination of the two receive queues causes a
failure or not.  Something like

P,128,256,192,128:P,65536,256,192,128

I am wondering if it is the first queuing definition causing the issue
or possibly the SRQ defined in the default.

--td

Eloi Gaudry wrote:


Hi Terry,

The messages being send/re

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-27 Thread Terry Dontje
I am thinking checking the value of *frag->hdr right before the return 
in the post_send function in ompi/mca/btl/openib/btl_openib_endpoint.h.  
It is line 548 in the trunk

https://svn.open-mpi.org/source/xref/ompi-trunk/ompi/mca/btl/openib/btl_openib_endpoint.h#548

--td

Eloi Gaudry wrote:

Hi Terry,

Do you have any patch that I could apply to be able to do so ? I'm remotely working on a cluster (with a terminal) and I cannot use any parallel debugger or sequential debugger (with a call to 
xterm...). I can track frag->hdr->tag value in ompi/mca/btl/openib/btl_openib_component.c::handle_wc in the SEND/RDMA_WRITE case, but this is all I can think of alone.


You'll find a stacktrace (receive side) in this thread (10th or 11th message) 
but it might be pointless.

Regards,
Eloi


On Monday 27 September 2010 11:43:55 Terry Dontje wrote:
  

So it sounds like coalescing is not your issue and that the problem has
something to do with the queue sizes.  It would be helpful if we could
detect the hdr->tag == 0 issue on the sending side and get at least a
stack trace.  There is something really odd going on here.

--td

Eloi Gaudry wrote:


Hi Terry,

I'm sorry to say that I might have missed a point here.

I've lately been relaunching all previously failing computations with
the message coalescing feature being switched off, and I saw the same
hdr->tag=0 error several times, always during a collective call
(MPI_Comm_create, MPI_Allreduce and MPI_Broadcast, so far). And as
soon as I switched to the peer queue option I was previously using
(--mca btl_openib_receive_queues P,65536,256,192,128 instead of using
--mca btl_openib_use_message_coalescing 0), all computations ran
flawlessly.

As for the reproducer, I've already tried to write something but I
haven't succeeded so far at reproducing the hdr->tag=0 issue with it.

Eloi

On 24/09/2010 18:37, Terry Dontje wrote:
  

Eloi Gaudry wrote:


Terry,

You were right, the error indeed seems to come from the message
coalescing feature. If I turn it off using the "--mca
btl_openib_use_message_coalescing 0", I'm not able to observe the
"hdr->tag=0" error.

There are some trac requests associated to very similar error
(https://svn.open-mpi.org/trac/ompi/search?q=coalescing) but they are
all closed (except https://svn.open-mpi.org/trac/ompi/ticket/2352 that
might be related), aren't they ? What would you suggest Terry ?
  

Interesting, though it looks to me like the segv in ticket 2352 would
have happened on the send side instead of the receive side like you
have.  As to what to do next it would be really nice to have some
sort of reproducer that we can try and debug what is really going
on.  The only other thing to do without a reproducer is to inspect
the code on the send side to figure out what might make it generate
at 0 hdr->tag.  Or maybe instrument the send side to stop when it is
about ready to send a 0 hdr->tag and see if we can see how the code
got there.

I might have some cycles to look at this Monday.

--td



Eloi

On Friday 24 September 2010 16:00:26 Terry Dontje wrote:
  

Eloi Gaudry wrote:


Terry,

No, I haven't tried any other values than P,65536,256,192,128 yet.

The reason why is quite simple. I've been reading and reading again
this thread to understand the btl_openib_receive_queues meaning and
I can't figure out why the default values seem to induce the hdr-

  

tag=0 issue
(http://www.open-mpi.org/community/lists/users/2009/01/7808.php).


Yeah, the size of the fragments and number of them really should not
cause this issue.  So I too am a little perplexed about it.



Do you think that the default shared received queue parameters are
erroneous for this specific Mellanox card ? Any help on finding the
proper parameters would actually be much appreciated.
  

I don't necessarily think it is the queue size for a specific card but
more so the handling of the queues by the BTL when using certain
sizes. At least that is one gut feel I have.

In my mind the tag being 0 is either something below OMPI is polluting
the data fragment or OMPI's internal protocol is some how getting
messed up.  I can imagine (no empirical data here) the queue sizes
could change how the OMPI protocol sets things up.  Another thing may
be the coalescing feature in the openib BTL which tries to gang
multiple messages into one packet when resources are running low.   I
can see where changing the queue sizes might affect the coalescing. 
So, it might be interesting to turn off the coalescing.  You can do

that by setting "--mca btl_openib_use_message_coalescing 0" in your
mpirun line.

If that doesn't solve the issue then obviously there must be something
else going on :-).

Note, the reason I am interested in this is I am seeing a similar
error

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-27 Thread Terry Dontje

Eloi, sorry can you print out frag->hdr->tag?

Unfortunately from your last email I think it will still all have 
non-zero values.
If that ends up being the case then there must be something odd with the 
descriptor pointer to the fragment.


--td

Eloi Gaudry wrote:

Terry,

Please find enclosed the requested check outputs (using -output-filename 
stdout.tag.null option).

For information, Nysal In his first message referred to 
ompi/mca/pml/ob1/pml_ob1_hdr.h and said that hdr->tg value was wrnong on 
receiving side.
#define MCA_PML_OB1_HDR_TYPE_MATCH (MCA_BTL_TAG_PML + 1)
#define MCA_PML_OB1_HDR_TYPE_RNDV  (MCA_BTL_TAG_PML + 2)
#define MCA_PML_OB1_HDR_TYPE_RGET  (MCA_BTL_TAG_PML + 3)
 #define MCA_PML_OB1_HDR_TYPE_ACK   (MCA_BTL_TAG_PML + 4)
#define MCA_PML_OB1_HDR_TYPE_NACK  (MCA_BTL_TAG_PML + 5)
#define MCA_PML_OB1_HDR_TYPE_FRAG  (MCA_BTL_TAG_PML + 6)
#define MCA_PML_OB1_HDR_TYPE_GET   (MCA_BTL_TAG_PML + 7)
 #define MCA_PML_OB1_HDR_TYPE_PUT   (MCA_BTL_TAG_PML + 8)
#define MCA_PML_OB1_HDR_TYPE_FIN   (MCA_BTL_TAG_PML + 9)
and in ompi/mca/btl/btl.h 
#define MCA_BTL_TAG_PML 0x40


Eloi

On Monday 27 September 2010 14:36:59 Terry Dontje wrote:
  

I am thinking checking the value of *frag->hdr right before the return
in the post_send function in ompi/mca/btl/openib/btl_openib_endpoint.h.
It is line 548 in the trunk
https://svn.open-mpi.org/source/xref/ompi-trunk/ompi/mca/btl/openib/btl_ope
nib_endpoint.h#548

--td

Eloi Gaudry wrote:


Hi Terry,

Do you have any patch that I could apply to be able to do so ? I'm
remotely working on a cluster (with a terminal) and I cannot use any
parallel debugger or sequential debugger (with a call to xterm...). I
can track frag->hdr->tag value in
ompi/mca/btl/openib/btl_openib_component.c::handle_wc in the
SEND/RDMA_WRITE case, but this is all I can think of alone.

You'll find a stacktrace (receive side) in this thread (10th or 11th
message) but it might be pointless.

Regards,
Eloi

On Monday 27 September 2010 11:43:55 Terry Dontje wrote:
  

So it sounds like coalescing is not your issue and that the problem has
something to do with the queue sizes.  It would be helpful if we could
detect the hdr->tag == 0 issue on the sending side and get at least a
stack trace.  There is something really odd going on here.

--td

Eloi Gaudry wrote:


Hi Terry,

I'm sorry to say that I might have missed a point here.

I've lately been relaunching all previously failing computations with
the message coalescing feature being switched off, and I saw the same
hdr->tag=0 error several times, always during a collective call
(MPI_Comm_create, MPI_Allreduce and MPI_Broadcast, so far). And as
soon as I switched to the peer queue option I was previously using
(--mca btl_openib_receive_queues P,65536,256,192,128 instead of using
--mca btl_openib_use_message_coalescing 0), all computations ran
flawlessly.

As for the reproducer, I've already tried to write something but I
haven't succeeded so far at reproducing the hdr->tag=0 issue with it.

Eloi

On 24/09/2010 18:37, Terry Dontje wrote:
  

Eloi Gaudry wrote:


Terry,

You were right, the error indeed seems to come from the message
coalescing feature. If I turn it off using the "--mca
btl_openib_use_message_coalescing 0", I'm not able to observe the
"hdr->tag=0" error.

There are some trac requests associated to very similar error
(https://svn.open-mpi.org/trac/ompi/search?q=coalescing) but they are
all closed (except https://svn.open-mpi.org/trac/ompi/ticket/2352
that might be related), aren't they ? What would you suggest Terry ?
  

Interesting, though it looks to me like the segv in ticket 2352 would
have happened on the send side instead of the receive side like you
have.  As to what to do next it would be really nice to have some
sort of reproducer that we can try and debug what is really going
on.  The only other thing to do without a reproducer is to inspect
the code on the send side to figure out what might make it generate
at 0 hdr->tag.  Or maybe instrument the send side to stop when it is
about ready to send a 0 hdr->tag and see if we can see how the code
got there.

I might have some cycles to look at this Monday.

--td



Eloi

On Friday 24 September 2010 16:00:26 Terry Dontje wrote:
  

Eloi Gaudry wrote:


Terry,

No, I haven't tried any other values than P,65536,256,192,128 yet.

The reason why is quite simple. I've been reading and reading again
this thread to understand the btl_openib_receive_queues meaning and
I can't figure out why the default values seem to induce the hdr-

  

tag=0 issue
(http://www.open-mpi.org/community/lists/users/2009/01/7808.php).


Yeah, the size of the fragments and number of them really should not
cause this issue.  

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-27 Thread Terry Dontje
Ok there were no 0 value tags in your files.  Are you running this with 
no eager RDMA?  If not can you set the following options "-mca 
btl_openib_use_eager_rdma 0 -mca btl_openib_max_eager_rdma 0 -mca 
btl_openib_flags 1".


thanks,

--td

Eloi Gaudry wrote:

Terry,

Please find enclosed the requested check outputs (using -output-filename 
stdout.tag.null option).
I'm displaying frag->hdr->tag here.

Eloi

On Monday 27 September 2010 16:29:12 Terry Dontje wrote:
  

Eloi, sorry can you print out frag->hdr->tag?

Unfortunately from your last email I think it will still all have
non-zero values.
If that ends up being the case then there must be something odd with the
descriptor pointer to the fragment.

--td

Eloi Gaudry wrote:


Terry,

Please find enclosed the requested check outputs (using -output-filename
stdout.tag.null option).

For information, Nysal In his first message referred to
ompi/mca/pml/ob1/pml_ob1_hdr.h and said that hdr->tg value was wrnong on
receiving side. #define MCA_PML_OB1_HDR_TYPE_MATCH (MCA_BTL_TAG_PML
+ 1)
#define MCA_PML_OB1_HDR_TYPE_RNDV  (MCA_BTL_TAG_PML + 2)
#define MCA_PML_OB1_HDR_TYPE_RGET  (MCA_BTL_TAG_PML + 3)

 #define MCA_PML_OB1_HDR_TYPE_ACK   (MCA_BTL_TAG_PML + 4)

#define MCA_PML_OB1_HDR_TYPE_NACK  (MCA_BTL_TAG_PML + 5)
#define MCA_PML_OB1_HDR_TYPE_FRAG  (MCA_BTL_TAG_PML + 6)
#define MCA_PML_OB1_HDR_TYPE_GET   (MCA_BTL_TAG_PML + 7)

 #define MCA_PML_OB1_HDR_TYPE_PUT   (MCA_BTL_TAG_PML + 8)

#define MCA_PML_OB1_HDR_TYPE_FIN   (MCA_BTL_TAG_PML + 9)
and in ompi/mca/btl/btl.h
#define MCA_BTL_TAG_PML 0x40

Eloi

On Monday 27 September 2010 14:36:59 Terry Dontje wrote:
  

I am thinking checking the value of *frag->hdr right before the return
in the post_send function in ompi/mca/btl/openib/btl_openib_endpoint.h.
It is line 548 in the trunk
https://svn.open-mpi.org/source/xref/ompi-trunk/ompi/mca/btl/openib/btl_
ope nib_endpoint.h#548

--td

Eloi Gaudry wrote:


Hi Terry,

Do you have any patch that I could apply to be able to do so ? I'm
remotely working on a cluster (with a terminal) and I cannot use any
parallel debugger or sequential debugger (with a call to xterm...). I
can track frag->hdr->tag value in
ompi/mca/btl/openib/btl_openib_component.c::handle_wc in the
SEND/RDMA_WRITE case, but this is all I can think of alone.

You'll find a stacktrace (receive side) in this thread (10th or 11th
message) but it might be pointless.

Regards,
Eloi

On Monday 27 September 2010 11:43:55 Terry Dontje wrote:
  

So it sounds like coalescing is not your issue and that the problem
has something to do with the queue sizes.  It would be helpful if we
could detect the hdr->tag == 0 issue on the sending side and get at
least a stack trace.  There is something really odd going on here.

--td

Eloi Gaudry wrote:


Hi Terry,

I'm sorry to say that I might have missed a point here.

I've lately been relaunching all previously failing computations with
the message coalescing feature being switched off, and I saw the same
hdr->tag=0 error several times, always during a collective call
(MPI_Comm_create, MPI_Allreduce and MPI_Broadcast, so far). And as
soon as I switched to the peer queue option I was previously using
(--mca btl_openib_receive_queues P,65536,256,192,128 instead of using
--mca btl_openib_use_message_coalescing 0), all computations ran
flawlessly.

As for the reproducer, I've already tried to write something but I
haven't succeeded so far at reproducing the hdr->tag=0 issue with it.

Eloi

On 24/09/2010 18:37, Terry Dontje wrote:
  

Eloi Gaudry wrote:


Terry,

You were right, the error indeed seems to come from the message
coalescing feature. If I turn it off using the "--mca
btl_openib_use_message_coalescing 0", I'm not able to observe the
"hdr->tag=0" error.

There are some trac requests associated to very similar error
(https://svn.open-mpi.org/trac/ompi/search?q=coalescing) but they
are all closed (except
https://svn.open-mpi.org/trac/ompi/ticket/2352 that might be
related), aren't they ? What would you suggest Terry ?
  

Interesting, though it looks to me like the segv in ticket 2352
would have happened on the send side instead of the receive side
like you have.  As to what to do next it would be really nice to
have some sort of reproducer that we can try and debug what is
really going on.  The only other thing to do without a reproducer
is to inspect the code on the send side to figure out what might
make it generate at 0 hdr->tag.  Or maybe instrument the send side
to stop when it is about ready to send a 0 hdr->tag and see if we
can see how the code got there.

I might have some cycles to look at this Monday.

--td



Eloi

On Friday 24 September 2010 16:00:26 Terry Dontje wrote:
  

Eloi Gaudr

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-29 Thread Terry Dontje
Pasha, do you by any chance know who at Mellanox might be responsible 
for OMPI working?


--td

Eloi Gaudry wrote:

 Hi Nysal, Terry,
Thanks for your input on this issue.
I'll follow your advice. Do you know any Mellanox developer I may 
discuss with, preferably someone who has spent some time inside the 
openib btl ?


Regards,
Eloi

On 29/09/2010 06:01, Nysal Jan wrote:

Hi Eloi,
We discussed this issue during the weekly developer meeting & there 
were no further suggestions, apart from checking the driver and 
firmware levels. The consensus was that it would be better if you 
could take this up directly with your IB vendor.


Regards
--Nysal

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 



Re: [OMPI users] [openib] segfault when using openib btl

2010-09-29 Thread Terry Dontje
In some of the testing Eloi did earlier he did disabled eager rdma and 
still saw the issue.


--td

Shamis, Pavel wrote:

Terry,
Ishai Rabinovitz is HPC team manager (I added him to CC)

Eloi,

Back to issue. I have seen very similar issue long time ago on some hardware 
platforms that support relaxed ordering memory operations. If I remember 
correct it was some IBM platform.
Do you know if relaxed memory ordering is enabled on your platform ? If it is 
enabled you have to disable eager rdma.

Regards,
Pasha

On Sep 29, 2010, at 1:04 PM, Terry Dontje wrote:

Pasha, do you by any chance know who at Mellanox might be responsible for OMPI 
working?

--td

Eloi Gaudry wrote:
 Hi Nysal, Terry,
Thanks for your input on this issue.
I'll follow your advice. Do you know any Mellanox developer I may discuss with, 
preferably someone who has spent some time inside the openib btl ?

Regards,
Eloi

On 29/09/2010 06:01, Nysal Jan wrote:
Hi Eloi,
We discussed this issue during the weekly developer meeting & there were no 
further suggestions, apart from checking the driver and firmware levels. The 
consensus was that it would be better if you could take this up directly with your 
IB vendor.

Regards
--Nysal
___
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users


--

Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle - Performance Technologies
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com<mailto:terry.don...@oracle.com>




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
  



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>



Re: [OMPI users] Bad performance when scattering big size of data?

2010-10-05 Thread Terry Dontje

 On 10/05/2010 10:23 AM, Storm Zhang wrote:
Sorry, I should say one more thing about the 500 procs test. I tried 
to run two 500 procs at the same time using SGE and it runs fast and 
finishes at the same time as the single run. So I think OpenMPI can 
handle them separately very well.


For the bind-to-core, I tried to run mpirun --help but not find the 
bind-to-core info. I only see bynode or byslot options. Is it same as 
bind-to-core? My mpirun shows version 1.3.3 but ompi_info shows 1.4.2.


No, -bynode/-byslot is for mapping not binding.  I cannot explain the 
different release versions of ompi_info and mpirun.  Have you done a 
which to see where each of them are located.  Anyways, 1.3.3 does not 
have any of the -bind-to-* options.


--td

Thanks a lot.

Linbao


On Mon, Oct 4, 2010 at 9:18 PM, Eugene Loh > wrote:


Storm Zhang wrote:


Here is what I meant: the results of 500 procs in fact shows
it with 272-304(<500) real cores, the program's running time
is good, which is almost five times 100 procs' time. So it can
be handled very well. Therefore I guess OpenMPI or Rocks OS
does make use of hyperthreading to do the job. But with 600
procs, the running time is more than double of that of 500
procs. I don't know why. This is my problem.
BTW, how to use -bind-to-core? I added it as mpirun's options.
It always gives me error " the executable 'bind-to-core' can't
be found. Isn't it like:
mpirun --mca btl_tcp_if_include eth0 -np 600  -bind-to-core
scatttest


Thanks for sending the mpirun run and error message.  That helps.

It's not recognizing the --bind-to-core option.  (Single hyphen,
as you had, should also be okay.)  Skimming through the e-mail, it
looks like you are using OMPI 1.3.2 and 1.4.2.  Did you try
--bind-to-core with both?  If I remember my version numbers,
--bind-to-core will not be recognized with 1.3.2, but should be
with 1.4.2.  Could it be that you only tried 1.3.2?

Another option is to try "mpirun --help".  Make sure that it
reports --bind-to-core.

___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] segmentation fault in mpiexec (Linux, Oracle/Sun C)

2010-10-20 Thread Terry Dontje
 Can you remove the -with-threads and -enable-mpi-threads options from 
the configure line and see if that helps your 32 bit problem any?


--td
On 10/20/2010 09:38 AM, Siegmar Gross wrote:

Hi,

I have built Open MPI 1.5 on Linux x86_64 with the Oracle/Sun Studio C
compiler. Unfortunately "mpiexec" breaks when I run a small propgram.

linpc4 small_prog 106 cc -V
cc: Sun C 5.10 Linux_i386 2009/06/03
usage: cc [ options] files.  Use 'cc -flags' for details

linpc4 small_prog 107 uname -a
Linux linpc4 2.6.27.45-0.1-default #1 SMP 2010-02-22 16:49:47 +0100 x86_64
x86_64 x86_64 GNU/Linux

linpc4 small_prog 108 mpicc -show
cc -I/usr/local/openmpi-1.5_32_cc/include -mt
   -L/usr/local/openmpi-1.5_32_cc/lib -lmpi -ldl -Wl,--export-dynamic -lnsl
   -lutil -lm -ldl

linpc4 small_prog 109 mpicc -m32 rank_size.c
linpc4 small_prog 110 mpiexec -np 2 a.out
I'm process 0 of 2 available processes running on linpc4.
MPI standard 2.1 is supported.
I'm process 1 of 2 available processes running on linpc4.
MPI standard 2.1 is supported.
[linpc4:11564] *** Process received signal ***
[linpc4:11564] Signal: Segmentation fault (11)
[linpc4:11564] Signal code:  (128)
[linpc4:11564] Failing at address: (nil)
[linpc4:11565] *** Process received signal ***
[linpc4:11565] Signal: Segmentation fault (11)
[linpc4:11565] Signal code:  (128)
[linpc4:11565] Failing at address: (nil)
[linpc4:11564] [ 0] [0xe410]
[linpc4:11564] [ 1] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   (mca_base_components_close+0x8c) [0xf774ccd0]
[linpc4:11564] [ 2] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   (mca_btl_base_close+0xc5) [0xf76bd255]
[linpc4:11564] [ 3] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   (mca_bml_base_close+0x32) [0xf76bd112]
[linpc4:11564] [ 4] /usr/local/openmpi-1.5_32_cc/lib/openmpi/
   mca_pml_ob1.so [0xf73d971f]
[linpc4:11564] [ 5] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   (mca_base_components_close+0x8c) [0xf774ccd0]
[linpc4:11564] [ 6] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   (mca_pml_base_close+0xc1) [0xf76e4385]
[linpc4:11564] [ 7] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   [0xf76889e6]
[linpc4:11564] [ 8] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   (PMPI_Finalize+0x3c) [0xf769dd4c]
[linpc4:11564] [ 9] a.out(main+0x98) [0x8048a18]
[linpc4:11564] [10] /lib/libc.so.6(__libc_start_main+0xe5) [0xf749c705]
[linpc4:11564] [11] a.out(_start+0x41) [0x8048861]
[linpc4:11564] *** End of error message ***
[linpc4:11565] [ 0] [0xe410]
[linpc4:11565] [ 1] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   (mca_base_components_close+0x8c) [0xf76bccd0]
[linpc4:11565] [ 2] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   (mca_btl_base_close+0xc5) [0xf762d255]
[linpc4:11565] [ 3] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   (mca_bml_base_close+0x32) [0xf762d112]
[linpc4:11565] [ 4] /usr/local/openmpi-1.5_32_cc/lib/openmpi/
   mca_pml_ob1.so [0xf734971f]
[linpc4:11565] [ 5] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   (mca_base_components_close+0x8c) [0xf76bccd0]
[linpc4:11565] [ 6] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   (mca_pml_base_close+0xc1) [0xf7654385]
[linpc4:11565] [ 7] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   [0xf75f89e6]
[linpc4:11565] [ 8] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   (PMPI_Finalize+0x3c) [0xf760dd4c]
[linpc4:11565] [ 9] a.out(main+0x98) [0x8048a18]
[linpc4:11565] [10] /lib/libc.so.6(__libc_start_main+0xe5) [0xf740c705]
[linpc4:11565] [11] a.out(_start+0x41) [0x8048861]
[linpc4:11565] *** End of error message ***
--
mpiexec noticed that process rank 0 with PID 11564 on node linpc4 exited
   on signal 11 (Segmentation fault).
--
2 total processes killed (some possibly by mpiexec during cleanup)
linpc4 small_prog 111


"make check" shows that one test failed.

linpc4 openmpi-1.5-Linux.x86_64.32_cc 114 grep FAIL
   log.make-check.Linux.x86_64.32_cc
FAIL: opal_path_nfs
linpc4 openmpi-1.5-Linux.x86_64.32_cc 115 grep PASS
   log.make-check.Linux.x86_64.32_cc
PASS: predefined_gap_test
PASS: dlopen_test
PASS: atomic_barrier
PASS: atomic_barrier_noinline
PASS: atomic_spinlock
PASS: atomic_spinlock_noinline
PASS: atomic_math
PASS: atomic_math_noinline
PASS: atomic_cmpset
PASS: atomic_cmpset_noinline
decode [PASSED]
PASS: opal_datatype_test
PASS: checksum
PASS: position
decode [PASSED]
PASS: ddt_test
decode [PASSED]
PASS: ddt_raw
linpc4 openmpi-1.5-Linux.x86_64.32_cc 116

I used the following command to build the package.

../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_32_cc \
   CFLAGS="-m32" CXXFLAGS="-m32" FFLAGS="-m32" FCFLAGS="-m32" \
   CXXLDFLAGS="-m32" CPPFLAGS="" \
   LDFLAGS="-m32" \
   C_INCL_PATH="" C_INCLUDE_PATH="" CPLUS_INCLUDE_PATH="" \
   OBJC_INCLUDE_PATH="" MPICHHOME="" \
   CC="cc" CXX="CC" F77="f95" FC="f95" \
   --without-udapl --with-threads=posix --enable-mpi-threads \
   --enable-shared --enable-he

Re: [OMPI users] segmentation fault in mpiexec (Linux, Oracle/Sun C)

2010-10-21 Thread Terry Dontje
 I wonder if the error below be due to crap being left over in the 
source tree.  Can you do a "make clean".  Note on a new checkout from 
the v1.5 svn branch I was able to build 64 bit with the following 
configure line:


../configure FC=f95 F77=f77 CC=cc CXX=CC --without-openib 
--without-udapl -enable-heterogeneous --enable-cxx-exceptions 
--enable-shared --enable-orterun-prefix-by-default --with-sge 
--disable-mpi-threads --enable-mpi-f90 --with-mpi-f90-size=small 
--disable-progress-threads --prefix=/workspace/tdd/ctnext/v15 
CFLAGS=-m64 CXXFLAGS=-m64 FFLAGS=-m64 FCFLAGS=-m64


--td
On 10/21/2010 05:38 AM, Siegmar Gross wrote:

Hi,

thank you very much for your reply.


   Can you remove the -with-threads and -enable-mpi-threads options from
the configure line and see if that helps your 32 bit problem any?

I cannot build the package when I remove these options.

linpc4 openmpi-1.5-Linux.x86_64.32_cc 189 head -8 config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Open MPI configure 1.5, which was
generated by GNU Autoconf 2.65.  Invocation command line was

   $ ../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_32_cc
   CFLAGS=-m32 CXXFLAGS=-m32 FFLAGS=-m32 FCFLAGS=-m32 CXXLDFLAGS=-m32
   CPPFLAGS= LDFLAGS=-m32 C_INCL_PATH= C_INCLUDE_PATH= CPLUS_INCLUDE_PATH=
   OBJC_INCLUDE_PATH= MPICHHOME= CC=cc CXX=CC F77=f95 FC=f95
   --without-udapl --enable-shared --enable-heterogeneous
   --enable-cxx-exceptions


linpc4 openmpi-1.5-Linux.x86_64.32_cc 190 head -8 ../*.old/config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Open MPI configure 1.5, which was
generated by GNU Autoconf 2.65.  Invocation command line was

   $ ../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_32_cc
   CFLAGS=-m32 CXXFLAGS=-m32 FFLAGS=-m32 FCFLAGS=-m32 CXXLDFLAGS=-m32
   CPPFLAGS= LDFLAGS=-m32 C_INCL_PATH= C_INCLUDE_PATH= CPLUS_INCLUDE_PATH=
   OBJC_INCLUDE_PATH= MPICHHOME= CC=cc CXX=CC F77=f95 FC=f95
   --without-udapl --with-threads=posix --enable-mpi-threads
   --enable-shared --enable-heterogeneous --enable-cxx-exceptions


linpc4 openmpi-1.5-Linux.x86_64.32_cc 194 dir log.* ../*.old/log.*
... 132406 Oct 19 13:01
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.configure.Linux.x86_64.32_cc
... 195587 Oct 19 16:09
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.make-check.Linux.x86_64.32_cc
... 356672 Oct 19 16:07
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.make-install.Linux.x86_64.32_cc
... 280596 Oct 19 13:42
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.make.Linux.x86_64.32_cc
... 132265 Oct 21 10:51 log.configure.Linux.x86_64.32_cc
...  10890 Oct 21 10:51 log.make.Linux.x86_64.32_cc


linpc4 openmpi-1.5-Linux.x86_64.32_cc 195 grep -i warning:
   log.configure.Linux.x86_64.32_cc
configure: WARNING: *** Did not find corresponding C type
configure: WARNING: MPI_REAL16 and MPI_COMPLEX32 support have been disabled
configure: WARNING: *** Corresponding Fortran 77 type (REAL*16) not supported
configure: WARNING: *** Skipping Fortran 90 type (REAL*16)
configure: WARNING: valgrind.h not found
configure: WARNING: Unknown architecture ... proceeding anyway
configure: WARNING: File locks may not work with NFS.  See the Installation and
configure: WARNING:  -xldscope=hidden has been added to CFLAGS

linpc4 openmpi-1.5-Linux.x86_64.32_cc 196 grep -i warning:
   ../*.old/log.configure.Linux.x86_64.32_cc
configure: WARNING: *** Did not find corresponding C type
configure: WARNING: MPI_REAL16 and MPI_COMPLEX32 support have been disabled
configure: WARNING: *** Corresponding Fortran 77 type (REAL*16) not supported
configure: WARNING: *** Skipping Fortran 90 type (REAL*16)
configure: WARNING: valgrind.h not found
configure: WARNING: Unknown architecture ... proceeding anyway
configure: WARNING: File locks may not work with NFS.  See the Installation and
configure: WARNING:  -xldscope=hidden has been added to CFLAGS

linpc4 openmpi-1.5-Linux.x86_64.32_cc 197 grep -i error:
   log.configure.Linux.x86_64.32_cc
configure: error: no libz found; check path for ZLIB package first...
configure: error: no vtf3.h found; check path for VTF3 package first...
configure: error: no BPatch.h found; check path for Dyninst package first...
configure: error: no f2c.h found; check path for CLAPACK package first...
configure: error: MPI Correctness Checking support cannot be built inside Open
MPI
configure: error: no papi.h found; check path for PAPI package first...
configure: error: no libcpc.h found; check path for CPC package first...
configure: error: no ctool/ctool.h found; check path for CTool package first...

linpc4 openmpi-1.5-Linux.x86_64.32_cc 198 grep -i error:
   ../*.old/log.configure.Linux.x86_64.32_cc
configure: error: no libz found; check path for ZLIB package first...
configure: error: no vtf3.h found; check path for VTF3 package firs

Re: [OMPI users] segmentation fault in mpiexec (Linux, Oracle/Sun C)

2010-10-21 Thread Terry Dontje

 On 10/21/2010 06:43 AM, Jeff Squyres (jsquyres) wrote:
Also, i'm not entirely sure what all the commands are that you are 
showing. Some of those warnings (eg in config.log) are normal.


The 32 bit test failure is not, though. Terry - any idea there?
The test program is failing in MPI_Finalize which seems odd and the code 
itself looks pretty dead simple.  I am rebuilding a v1.5 workspace 
without the different thread options.  Once that is done I'll try the 
test program.


BTW, when I tried to build with the original options Siegmar used the 
compiles looked like they hung, doh.


--td



Sent from my PDA. No type good.

On Oct 21, 2010, at 6:25 AM, "Terry Dontje" <mailto:terry.don...@oracle.com>> wrote:


I wonder if the error below be due to crap being left over in the 
source tree.  Can you do a "make clean".  Note on a new checkout from 
the v1.5 svn branch I was able to build 64 bit with the following 
configure line:


../configure FC=f95 F77=f77 CC=cc CXX=CC --without-openib 
--without-udapl -enable-heterogeneous --enable-cxx-exceptions 
--enable-shared --enable-orterun-prefix-by-default --with-sge 
--disable-mpi-threads --enable-mpi-f90 --with-mpi-f90-size=small 
--disable-progress-threads --prefix=/workspace/tdd/ctnext/v15 
CFLAGS=-m64 CXXFLAGS=-m64 FFLAGS=-m64 FCFLAGS=-m64


--td
On 10/21/2010 05:38 AM, Siegmar Gross wrote:

Hi,

thank you very much for your reply.


   Can you remove the -with-threads and -enable-mpi-threads options from
the configure line and see if that helps your 32 bit problem any?

I cannot build the package when I remove these options.

linpc4 openmpi-1.5-Linux.x86_64.32_cc 189 head -8 config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Open MPI configure 1.5, which was
generated by GNU Autoconf 2.65.  Invocation command line was

   $ ../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_32_cc
   CFLAGS=-m32 CXXFLAGS=-m32 FFLAGS=-m32 FCFLAGS=-m32 CXXLDFLAGS=-m32
   CPPFLAGS= LDFLAGS=-m32 C_INCL_PATH= C_INCLUDE_PATH= CPLUS_INCLUDE_PATH=
   OBJC_INCLUDE_PATH= MPICHHOME= CC=cc CXX=CC F77=f95 FC=f95
   --without-udapl --enable-shared --enable-heterogeneous
   --enable-cxx-exceptions


linpc4 openmpi-1.5-Linux.x86_64.32_cc 190 head -8 ../*.old/config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Open MPI configure 1.5, which was
generated by GNU Autoconf 2.65.  Invocation command line was

   $ ../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_32_cc
   CFLAGS=-m32 CXXFLAGS=-m32 FFLAGS=-m32 FCFLAGS=-m32 CXXLDFLAGS=-m32
   CPPFLAGS= LDFLAGS=-m32 C_INCL_PATH= C_INCLUDE_PATH= CPLUS_INCLUDE_PATH=
   OBJC_INCLUDE_PATH= MPICHHOME= CC=cc CXX=CC F77=f95 FC=f95
   --without-udapl --with-threads=posix --enable-mpi-threads
   --enable-shared --enable-heterogeneous --enable-cxx-exceptions


linpc4 openmpi-1.5-Linux.x86_64.32_cc 194 dir log.* ../*.old/log.*
... 132406 Oct 19 13:01
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.configure.Linux.x86_64.32_cc
... 195587 Oct 19 16:09
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.make-check.Linux.x86_64.32_cc
... 356672 Oct 19 16:07
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.make-install.Linux.x86_64.32_cc
... 280596 Oct 19 13:42
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.make.Linux.x86_64.32_cc
... 132265 Oct 21 10:51 log.configure.Linux.x86_64.32_cc
...  10890 Oct 21 10:51 log.make.Linux.x86_64.32_cc


linpc4 openmpi-1.5-Linux.x86_64.32_cc 195 grep -i warning:
   log.configure.Linux.x86_64.32_cc
configure: WARNING: *** Did not find corresponding C type
configure: WARNING: MPI_REAL16 and MPI_COMPLEX32 support have been disabled
configure: WARNING: *** Corresponding Fortran 77 type (REAL*16) not supported
configure: WARNING: *** Skipping Fortran 90 type (REAL*16)
configure: WARNING: valgrind.h not found
configure: WARNING: Unknown architecture ... proceeding anyway
configure: WARNING: File locks may not work with NFS.  See the Installation and
configure: WARNING:  -xldscope=hidden has been added to CFLAGS

linpc4 openmpi-1.5-Linux.x86_64.32_cc 196 grep -i warning:
   ../*.old/log.configure.Linux.x86_64.32_cc
configure: WARNING: *** Did not find corresponding C type
configure: WARNING: MPI_REAL16 and MPI_COMPLEX32 support have been disabled
configure: WARNING: *** Corresponding Fortran 77 type (REAL*16) not supported
configure: WARNING: *** Skipping Fortran 90 type (REAL*16)
configure: WARNING: valgrind.h not found
configure: WARNING: Unknown architecture ... proceeding anyway
configure: WARNING: File locks may not work with NFS.  See the Installation and
configure: WARNING:  -xldscope=hidden has been added to CFLAGS

linpc4 openmpi-1.5-Linux.x86_64.32_cc 197 grep -i error:
   log.configure.Linux.x86_64.32_cc
configure: error: no libz found; check path for ZLIB package first...
configure:

Re: [OMPI users] segmentation fault in mpiexec (Linux, Oracle/Sun C)

2010-10-21 Thread Terry Dontje
 I've reproduced Siegmar's issue when I have the threads options on but 
it does not show up when they are off.  It is actually segv'ing in 
mca_btl_sm_component_close on an access at address 0 (obviously not a 
good thing).  I am going compile things with debug on and see if I can 
track this further but I think I am smelling the smoke of a bug...


Siegmar, I was able to get stuff working with 32 bits when I removed 
-with-threads=posix and replaced "-enable-mpi-threads" with 
--disable-mpi-threads in your configure line.  I think your previous 
issue with things not building must be left over cruft.


Note, my compiler hang disappeared on me.  So maybe there was an 
environmental issue on my side.


--td


On 10/21/2010 06:47 AM, Terry Dontje wrote:

On 10/21/2010 06:43 AM, Jeff Squyres (jsquyres) wrote:
Also, i'm not entirely sure what all the commands are that you are 
showing. Some of those warnings (eg in config.log) are normal.


The 32 bit test failure is not, though. Terry - any idea there?
The test program is failing in MPI_Finalize which seems odd and the 
code itself looks pretty dead simple.  I am rebuilding a v1.5 
workspace without the different thread options.  Once that is done 
I'll try the test program.


BTW, when I tried to build with the original options Siegmar used the 
compiles looked like they hung, doh.


--td



Sent from my PDA. No type good.

On Oct 21, 2010, at 6:25 AM, "Terry Dontje" <mailto:terry.don...@oracle.com>> wrote:


I wonder if the error below be due to crap being left over in the 
source tree.  Can you do a "make clean".  Note on a new checkout 
from the v1.5 svn branch I was able to build 64 bit with the 
following configure line:


../configure FC=f95 F77=f77 CC=cc CXX=CC --without-openib 
--without-udapl -enable-heterogeneous --enable-cxx-exceptions 
--enable-shared --enable-orterun-prefix-by-default --with-sge 
--disable-mpi-threads --enable-mpi-f90 --with-mpi-f90-size=small 
--disable-progress-threads --prefix=/workspace/tdd/ctnext/v15 
CFLAGS=-m64 CXXFLAGS=-m64 FFLAGS=-m64 FCFLAGS=-m64


--td
On 10/21/2010 05:38 AM, Siegmar Gross wrote:

Hi,

thank you very much for your reply.


   Can you remove the -with-threads and -enable-mpi-threads options from
the configure line and see if that helps your 32 bit problem any?

I cannot build the package when I remove these options.

linpc4 openmpi-1.5-Linux.x86_64.32_cc 189 head -8 config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Open MPI configure 1.5, which was
generated by GNU Autoconf 2.65.  Invocation command line was

   $ ../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_32_cc
   CFLAGS=-m32 CXXFLAGS=-m32 FFLAGS=-m32 FCFLAGS=-m32 CXXLDFLAGS=-m32
   CPPFLAGS= LDFLAGS=-m32 C_INCL_PATH= C_INCLUDE_PATH= CPLUS_INCLUDE_PATH=
   OBJC_INCLUDE_PATH= MPICHHOME= CC=cc CXX=CC F77=f95 FC=f95
   --without-udapl --enable-shared --enable-heterogeneous
   --enable-cxx-exceptions


linpc4 openmpi-1.5-Linux.x86_64.32_cc 190 head -8 ../*.old/config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Open MPI configure 1.5, which was
generated by GNU Autoconf 2.65.  Invocation command line was

   $ ../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_32_cc
   CFLAGS=-m32 CXXFLAGS=-m32 FFLAGS=-m32 FCFLAGS=-m32 CXXLDFLAGS=-m32
   CPPFLAGS= LDFLAGS=-m32 C_INCL_PATH= C_INCLUDE_PATH= CPLUS_INCLUDE_PATH=
   OBJC_INCLUDE_PATH= MPICHHOME= CC=cc CXX=CC F77=f95 FC=f95
   --without-udapl --with-threads=posix --enable-mpi-threads
   --enable-shared --enable-heterogeneous --enable-cxx-exceptions


linpc4 openmpi-1.5-Linux.x86_64.32_cc 194 dir log.* ../*.old/log.*
... 132406 Oct 19 13:01
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.configure.Linux.x86_64.32_cc
... 195587 Oct 19 16:09
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.make-check.Linux.x86_64.32_cc
... 356672 Oct 19 16:07
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.make-install.Linux.x86_64.32_cc
... 280596 Oct 19 13:42
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.make.Linux.x86_64.32_cc
... 132265 Oct 21 10:51 log.configure.Linux.x86_64.32_cc
...  10890 Oct 21 10:51 log.make.Linux.x86_64.32_cc


linpc4 openmpi-1.5-Linux.x86_64.32_cc 195 grep -i warning:
   log.configure.Linux.x86_64.32_cc
configure: WARNING: *** Did not find corresponding C type
configure: WARNING: MPI_REAL16 and MPI_COMPLEX32 support have been disabled
configure: WARNING: *** Corresponding Fortran 77 type (REAL*16) not supported
configure: WARNING: *** Skipping Fortran 90 type (REAL*16)
configure: WARNING: valgrind.h not found
configure: WARNING: Unknown architecture ... proceeding anyway
configure: WARNING: File locks may not work with NFS.  See the Installation and
configure: WARNING:  -xldscope=hidden has been added to CFLAGS

linpc4

Re: [OMPI users] segmentation fault in mpiexec (Linux, Oracle/Sun C)

2010-10-21 Thread Terry Dontje

 On 10/21/2010 10:18 AM, Jeff Squyres wrote:

Terry --

Can you file relevant ticket(s) for v1.5 on Trac?

Once I have more information and have proven it isn't due to us using 
old compilers or a compiler error itself.


--td

On Oct 21, 2010, at 10:10 AM, Terry Dontje wrote:


I've reproduced Siegmar's issue when I have the threads options on but it does 
not show up when they are off.  It is actually segv'ing in 
mca_btl_sm_component_close on an access at address 0 (obviously not a good 
thing).  I am going compile things with debug on and see if I can track this 
further but I think I am smelling the smoke of a bug...

Siegmar, I was able to get stuff working with 32 bits when I removed -with-threads=posix 
and replaced "-enable-mpi-threads" with --disable-mpi-threads in your configure 
line.  I think your previous issue with things not building must be left over cruft.

Note, my compiler hang disappeared on me.  So maybe there was an environmental 
issue on my side.

--td


On 10/21/2010 06:47 AM, Terry Dontje wrote:

On 10/21/2010 06:43 AM, Jeff Squyres (jsquyres) wrote:

Also, i'm not entirely sure what all the commands are that you are showing. 
Some of those warnings (eg in config.log) are normal.

The 32 bit test failure is not, though. Terry - any idea there?

The test program is failing in MPI_Finalize which seems odd and the code itself 
looks pretty dead simple.  I am rebuilding a v1.5 workspace without the 
different thread options.  Once that is done I'll try the test program.

BTW, when I tried to build with the original options Siegmar used the compiles 
looked like they hung, doh.

--td


Sent from my PDA. No type good.

On Oct 21, 2010, at 6:25 AM, "Terry Dontje"  wrote:


I wonder if the error below be due to crap being left over in the source tree.  Can you 
do a "make clean".  Note on a new checkout from the v1.5 svn branch I was able 
to build 64 bit with the following configure line:

../configure FC=f95 F77=f77 CC=cc CXX=CC --without-openib --without-udapl 
-enable-heterogeneous --enable-cxx-exceptions --enable-shared 
--enable-orterun-prefix-by-default --with-sge --disable-mpi-threads 
--enable-mpi-f90 --with-mpi-f90-size=small --disable-progress-threads 
--prefix=/workspace/tdd/ctnext/v15 CFLAGS=-m64 CXXFLAGS=-m64 
FFLAGS=-m64 FCFLAGS=-m64

--td
On 10/21/2010 05:38 AM, Siegmar Gross wrote:

Hi,

thank you very much for your reply.



   Can you remove the -with-threads and -enable-mpi-threads options from
the configure line and see if that helps your 32 bit problem any?


I cannot build the package when I remove these options.

linpc4 openmpi-1.5-Linux.x86_64.32_cc 189 head -8 config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Open MPI configure 1.5, which was
generated by GNU Autoconf 2.65.  Invocation command line was

   $ ../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_32_cc
   CFLAGS=-m32 CXXFLAGS=-m32 FFLAGS=-m32 FCFLAGS=-m32 CXXLDFLAGS=-m32
   CPPFLAGS= LDFLAGS=-m32 C_INCL_PATH= C_INCLUDE_PATH= CPLUS_INCLUDE_PATH=
   OBJC_INCLUDE_PATH= MPICHHOME= CC=cc CXX=CC F77=f95 FC=f95
   --without-udapl --enable-shared --enable-heterogeneous
   --enable-cxx-exceptions


linpc4 openmpi-1.5-Linux.x86_64.32_cc 190 head -8 ../*.old/config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Open MPI configure 1.5, which was
generated by GNU Autoconf 2.65.  Invocation command line was

   $ ../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_32_cc
   CFLAGS=-m32 CXXFLAGS=-m32 FFLAGS=-m32 FCFLAGS=-m32 CXXLDFLAGS=-m32
   CPPFLAGS= LDFLAGS=-m32 C_INCL_PATH= C_INCLUDE_PATH= CPLUS_INCLUDE_PATH=
   OBJC_INCLUDE_PATH= MPICHHOME= CC=cc CXX=CC F77=f95 FC=f95
   --without-udapl --with-threads=posix --enable-mpi-threads
   --enable-shared --enable-heterogeneous --enable-cxx-exceptions


linpc4 openmpi-1.5-Linux.x86_64.32_cc 194 dir log.* ../*.old/log.*
... 132406 Oct 19 13:01
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.configure.Linux.x86_64.32_cc
... 195587 Oct 19 16:09
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.make-check.Linux.x86_64.32_cc
... 356672 Oct 19 16:07
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.make-install.Linux.x86_64.32_cc
... 280596 Oct 19 13:42
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.make.Linux.x86_64.32_cc
... 132265 Oct 21 10:51 log.configure.Linux.x86_64.32_cc
...  10890 Oct 21 10:51 log.make.Linux.x86_64.32_cc


linpc4 openmpi-1.5-Linux.x86_64.32_cc 195 grep -i warning:
   log.configure.Linux.x86_64.32_cc
configure: WARNING: *** Did not find corresponding C type
configure: WARNING: MPI_REAL16 and MPI_COMPLEX32 support have been disabled
configure: WARNING: *** Corresponding Fortran 77 type (REAL*16) not supported
configure: WARNING: *** Skipping Fortran 90 type (REAL*16)

Re: [OMPI users] segmentation fault in mpiexec (Linux, Oracle/Sun C)

2010-10-21 Thread Terry Dontje
 When you do a make can your add a V=1 to have the actual compile lines 
printed out.  That will probably show you the line with 
-fno-directives-only in it.  Which is odd because I think that option is 
a gcc'ism and don't know why it would show up in a studio build (note my 
build doesn't show it).


Maybe a copy of the config.log and config.status might be helpful.  Have 
you tried to start from square one?  It really seems like the configure 
or libtool might be setting things up for gcc which is odd with the 
configure line you show.


--td

On 10/21/2010 09:41 AM, Siegmar Gross wrote:

   I wonder if the error below be due to crap being left over in the
source tree.  Can you do a "make clean".  Note on a new checkout from
the v1.5 svn branch I was able to build 64 bit with the following
configure line:

linpc4 openmpi-1.5-Linux.x86_64.32_cc 123 make clean
Making clean in test
make[1]: Entering directory
...

../openmpi-1.5/configure \
   FC=f95 F77=f77 CC=cc CXX=CC --without-openib --without-udapl \
   --enable-heterogeneous --enable-cxx-exceptions --enable-shared \
   --enable-orterun-prefix-by-default --with-sge --disable-mpi-threads \
   --enable-mpi-f90 --with-mpi-f90-size=small --disable-progress-threads \
   --prefix=/usr/local/openmpi-1.5_32_cc CFLAGS=-m64 CXXFLAGS=-m64 \
   FFLAGS=-m64 FCFLAGS=-m64

make |&  tee log.make.$SYSTEM_ENV.$MACHINE_ENV.32_cc


...
make[3]: Leaving directory
`/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc/opal/libltdl'
make[2]: Leaving directory
`/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc/opal/libltdl'
Making all in asm
make[2]: Entering directory
`/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc/opal/asm'
   CC asm.lo
rm -f atomic-asm.S
ln -s ".../opal/asm/generated/atomic-ia32-linux-nongas.s" atomic-asm.S
   CPPAS  atomic-asm.lo
cc1: error: unrecognized command line option "-fno-directives-only"
cc: cpp failed for atomic-asm.S
make[2]: *** [atomic-asm.lo] Error 1
make[2]: Leaving directory `.../opal/asm'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/.../opal'
make: *** [all-recursive] Error 1


Do you know where I can find "-fno-directives-only"? "grep" didn't
show any results. I tried to rebuild the package with my original
settings and didn't succeed (same error as above) so that something
must have changed in the last two days on "linpc4". The operator told
me that he hasn't changed anything, so I have no idea why I cannot
build the package today. The log-files from "configure" are identical,
but the log-files from "make" differ (I changed the language with
"setenv LC_ALL C" because I have some errors on other machines as well
and wanted english messages so that you can read them).


tyr openmpi-1.5 198 diff
   openmpi-1.5-Linux.x86_64.32_cc.old/log.configure.Linux.x86_64.32_cc
   openmpi-1.5-Linux.x86_64.32_cc/log.configure.Linux.x86_64.32_cc |more

tyr openmpi-1.5 199 diff
   openmpi-1.5-Linux.x86_64.32_cc.old/log.make.Linux.x86_64.32_cc
   openmpi-1.5-Linux.x86_64.32_cc/log.make.Linux.x86_64.32_cc | more
3c3
<  make[1]: Für das Ziel »all« ist nichts zu tun.
---

make[1]: Nothing to be done for `all'.

7c7
<  make[1]: Für das Ziel »all« ist nichts zu tun.
---

make[1]: Nothing to be done for `all'.

74,76c74,76
<  :19:0: Warnung: »__FLT_EVAL_METHOD__« redefiniert
<  :93:0: Anmerkung: dies ist die Stelle der vorherigen Definition


Re: [OMPI users] Open MPI program cannot complete

2010-10-25 Thread Terry Dontje
 So what you are saying is *all* the ranks have entered MPI_Finalize 
and only a subset has exited per placing prints before and after 
MPI_Finalize.  Good.  So my guess is that the processes stuck in 
MPI_Finalize have a prior MPI request outstanding that for whatever 
reason is unable to complete.  So I would first look at all the MPI 
requests and make sure they completed.


--td

On 10/25/2010 02:38 AM, Jack Bryan wrote:

thanks
I found a problem:

I used:

 cout << " I am rank " << rank << " I am before 
MPI_Finalize()" << endl;

 MPI_Finalize();
cout << " I am rank " << rank << " I am after MPI_Finalize()" << endl;
 return 0;

I can get the output " I am rank 0 (1, 2, ) I am before 
MPI_Finalize() ".


and
   " I am rank 0 I am after MPI_Finalize() "
But, other processes do not printed out "I am rank ... I am after 
MPI_Finalize()" .


It is weird. The process has reached the point just before 
MPI_Finalize(), why they are hanged there ?


Are there other better ways to check this ?

Any help is appreciated.

thanks

Jack

Oct. 25 2010


From: solarbik...@gmail.com
Date: Sun, 24 Oct 2010 19:47:54 -0700
To: us...@open-mpi.org
Subject: Re: [OMPI users] Open MPI program cannot complete

how do you know all process call mpi_finalize?  did you have all of 
them print out something before they call mpi_finalize? I think what 
Gustavo is getting at is maybe you had some MPI calls within your 
snippets that hangs your program, thus some of your processes never 
called mpi_finalize.


On Sun, Oct 24, 2010 at 6:59 PM, Jack Bryan > wrote:


Thanks,

But, my code is too long to be posted.

What are the common reasons of this kind of problems ?

Any help is appreciated.

Jack

Oct. 24 2010

> From: g...@ldeo.columbia.edu 
> Date: Sun, 24 Oct 2010 18:09:52 -0400

> To: us...@open-mpi.org 
> Subject: Re: [OMPI users] Open MPI program cannot complete
>
> Hi Jack
>
> Your code snippet is too terse, doesn't show the MPI calls.
> It is hard to guess what is the problem this way.
>
> Gus Correa
> On Oct 24, 2010, at 5:43 PM, Jack Bryan wrote:
>
> > Thanks for the reply.
> > But, I use mpi_waitall() to make sure that all MPI
communications have been done before a process call MPI_Finalize()
and returns.
> >
> > Any help is appreciated.
> >
> > thanks
> >
> > Jack
> >
> > Oct. 24 2010
> >
> > > From: g...@ldeo.columbia.edu 
> > > Date: Sun, 24 Oct 2010 17:31:11 -0400
> > > To: us...@open-mpi.org 
> > > Subject: Re: [OMPI users] Open MPI program cannot complete
> > >
> > > Hi Jack
> > >
> > > It may depend on "do some things".
> > > Does it involve MPI communication?
> > >
> > > Also, why not put MPI_Finalize();return 0 outside the ifs?
> > >
> > > Gus Correa
> > >
> > > On Oct 24, 2010, at 2:23 PM, Jack Bryan wrote:
> > >
> > > > Hi
> > > >
> > > > I got a problem of open MPI.
> > > >
> > > > My program has 5 processes.
> > > >
> > > > All of them can run MPI_Finalize() and return 0.
> > > >
> > > > But, the whole program cannot be completed.
> > > >
> > > > In the MPI cluster job queue, it is strill in running status.
> > > >
> > > > If I use 1 process to run it, no problem.
> > > >
> > > > Why ?
> > > >
> > > > My program:
> > > >
> > > > int main (int argc, char **argv)
> > > > {
> > > >
> > > > MPI_Init(&argc, &argv);
> > > > MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
> > > > MPI_Comm_size(MPI_COMM_WORLD, &mySize);
> > > > MPI_Comm world;
> > > > world = MPI_COMM_WORLD;
> > > >
> > > > if (myRank == 0)
> > > > {
> > > > do some things.
> > > > }
> > > >
> > > > if (myRank != 0)
> > > > {
> > > > do some things.
> > > > MPI_Finalize();
> > > > return 0 ;
> > > > }
> > > > if (myRank == 0)
> > > > {
> > > > MPI_Finalize();
> > > > return 0;
> > > > }
> > > >
> > > > }
> > > >
> > > > And, some output files get wrong codes, which can not be
readible.
> > > > In 1-process case, the program can print correct results
to these output files .
> > > >
> > > > Any help is appreciated.
> > > >
> > > > thanks
> > > >
> > > > Jack
> > > >
> > > > Oct. 24 2010
> > > >
> > > > ___
> > > > users mailing list
> > > > us...@open-mpi.org 
> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > >
> > > ___
> > > users

Re: [OMPI users] cannot install Open MPI 1.5 on Solaris x86_64 with Oracle/Sun C 5.11

2010-10-29 Thread Terry Dontje
 Sorry, but can you give us the config line, the config.log and the 
full output of make preferrably with make V=1?


--td
On 10/29/2010 04:30 AM, Siegmar Gross wrote:

Hi,

I tried to build Open MPI 1.5 on Solaris X86 and x86_64 with Oracle
Studio 12.2. I can compile Open MPI with thread support, but I can
only partly install it because "libtool" will not find "f95" although
it is available. "make check" shows no failures.

tyr openmpi-1.5-SunOS.x86_64.32_cc 188 ssh sunpc4 cc -V
cc: Sun C 5.11 SunOS_i386 145355-01 2010/10/11
usage: cc [ options ] files.  Use 'cc -flags' for details

No suspicious warnings or errors in log.configure.SunOS.x86_64.32_cc.

tyr openmpi-1.5-SunOS.x86_64.32_cc 182 grep -i warning:
   log.make.SunOS.x86_64.32_cc | more

".../opal/mca/crs/none/crs_none_module.c", line 136:
   warning: statement not reached

".../orte/mca/errmgr/errmgr.h", line 135: warning: attribute
   "noreturn" may not be applied to variable, ignored
(a lot of these warnings)

".../orte/mca/rmcast/tcp/rmcast_tcp.c", line 982: warning:
   assignment type mismatch:
".../orte/mca/rmcast/tcp/rmcast_tcp.c", line 1023: warning:
   assignment type mismatch:
".../orte/mca/rmcast/udp/rmcast_udp.c", line 877: warning:
   assignment type mismatch:
".../orte/mca/rmcast/udp/rmcast_udp.c", line 918: warning:
   assignment type mismatch:

".../orte/tools/orte-ps/orte-ps.c", line 288: warning:
   initializer does not fit or is out of range: 0xfffe
".../orte/tools/orte-ps/orte-ps.c", line 289: warning:
   initializer does not fit or is out of range: 0xfffe

grep -i error: log.make.SunOS.x86_64.32_cc | more

tyr openmpi-1.5-SunOS.x86_64.32_cc 185 grep -i FAIL
   log.make-check.SunOS.x86_64.32_cc
tyr openmpi-1.5-SunOS.x86_64.32_cc 186 grep -i SKIP
   log.make-check.SunOS.x86_64.32_cc
tyr openmpi-1.5-SunOS.x86_64.32_cc 187 grep -i PASS
   log.make-check.SunOS.x86_64.32_cc
PASS: predefined_gap_test
File opened with dladvise_local, all passed
PASS: dlopen_test
All 2 tests passed
 - 1 threads: Passed
 - 2 threads: Passed
 - 4 threads: Passed
 - 5 threads: Passed
 - 8 threads: Passed
PASS: atomic_barrier
 - 1 threads: Passed
 - 2 threads: Passed
 - 4 threads: Passed
 - 5 threads: Passed
 - 8 threads: Passed
PASS: atomic_barrier_noinline
 - 1 threads: Passed
 - 2 threads: Passed
 - 4 threads: Passed
 - 5 threads: Passed
 - 8 threads: Passed
PASS: atomic_spinlock
 - 1 threads: Passed
 - 2 threads: Passed
 - 4 threads: Passed
 - 5 threads: Passed
 - 8 threads: Passed
PASS: atomic_spinlock_noinline
 - 1 threads: Passed
 - 2 threads: Passed
 - 4 threads: Passed
 - 5 threads: Passed
 - 8 threads: Passed
PASS: atomic_math
 - 1 threads: Passed
 - 2 threads: Passed
 - 4 threads: Passed
 - 5 threads: Passed
 - 8 threads: Passed
PASS: atomic_math_noinline
 - 1 threads: Passed
 - 2 threads: Passed
 - 4 threads: Passed
 - 5 threads: Passed
 - 8 threads: Passed
PASS: atomic_cmpset
 - 1 threads: Passed
 - 2 threads: Passed
 - 4 threads: Passed
 - 5 threads: Passed
 - 8 threads: Passed
PASS: atomic_cmpset_noinline
All 8 tests passed
All 0 tests passed
All 0 tests passed
decode [PASSED]
PASS: opal_datatype_test
PASS: checksum
PASS: position
decode [PASSED]
PASS: ddt_test
decode [PASSED]
PASS: ddt_raw
All 5 tests passed
SUPPORT: OMPI Test Passed: opal_path_nfs(): (0 tests)
PASS: opal_path_nfs
1 test passed


tyr openmpi-1.5-SunOS.x86_64.32_cc 190 grep -i warning:
   log.make-install.SunOS.x86_64.32_cc | more
libtool: install: warning: relinking `libmpi_cxx.la'
libtool: install: warning: relinking `libmpi_f77.la'
libtool: install: warning: relinking `libmpi_f90.la'

tyr openmpi-1.5-SunOS.x86_64.32_cc 191 grep -i error:
   log.make-install.SunOS.x86_64.32_cc | more
libtool: install: error: relink `libmpi_f90.la' with the above
   command before installing it

tyr openmpi-1.5-SunOS.x86_64.32_cc 194 tail -20
   log.make-install.SunOS.x86_64.32_cc
make[4]: Leaving directory `.../ompi/mpi/f90/scripts'
make[4]: Entering directory `.../ompi/mpi/f90'
make[5]: Entering directory `.../ompi/mpi/f90'
test -z "/usr/local/openmpi-1.5_32_cc/lib" ||
   /usr/local/bin/mkdir -p "/usr/local/openmpi-1.5_32_cc/lib"
  /bin/bash ../../../libtool   --mode=install /usr/local/bin/install -c
libmpi_f90.la '/usr/local/openmpi-1.5_32_cc/lib'
libtool: install: warning: relinking `libmpi_f90.la'
libtool: install: (cd
/export2/src/openmpi-1.5/openmpi-1.5-SunOS.x86_64.32_cc/ompi/mpi/f90; /bin/bash
/export2/src/openmpi-1.5/openmpi-1.5-SunOS.x86_64.32_cc/libtool  --silent --tag 
FC
--mode=relink f95 -I../../../ompi/include 
-I../../../../openmpi-1.5/ompi/include -I.
-I../../../../openmpi-1.5/ompi/mpi/f90 -I../../../ompi/mpi/f90 -m32 
-version-info 1:0:0
-export-dynamic -m32 -o libmpi_f90.la -rpath /usr/local/openmpi-1.5_32_cc/lib 
mpi.lo
mpi_sizeof.lo mpi_comm_spawn_multiple_f90.lo mpi_testall_f90.lo 
mpi_testsome_f90.

Re: [OMPI users] cannot install Open MPI 1.5 on Solaris x86_64 withOracle/Sun C 5.11

2010-11-01 Thread Terry Dontje
 I am able to build on Linux systems with Sun C 5.11 using gcc-4.1.2.  
Still trying to get a version of gcc 4.3.4 compiled on our systems so I 
can use it with Sun C 5.11 to build OMPI.


--td

On 11/01/2010 05:58 AM, Siegmar Gross wrote:

Hi,


   Sorry, but can you give us the config line, the config.log and the
full output of make preferrably with make V=1?

--td
On 10/29/2010 04:30 AM, Siegmar Gross wrote:

Hi,

I tried to build Open MPI 1.5 on Solaris X86 and x86_64 with Oracle
Studio 12.2. I can compile Open MPI with thread support, but I can
only partly install it because "libtool" will not find "f95" although
it is available. "make check" shows no failures.

I made a mistake the first time. I'm sorry for that. This weekend I
rebuild everything and now the following installations work. "ok"
means I could install the package and successfully run two small
programs (one is a simple matrix multiplication with MPI and OpenMP,
2 processes and 8 threads on a dual processor eight core SPARC64 VII
system). I used gcc-4.2.0 and Oracle/Sun C 5.11.

SunOS sparc,  32-bit, cc: ok
SunOS sparc,  64-bit, cc: ok
SunOS x86,32-bit, cc: ok
SunOS x86_64, 32-bit, cc: ok
SunOS x86_64, 64-bit, cc: ok
Linux x86,32-bit, cc: "make" still breaks
Linux x86_64, 32-bit, cc: "make" still breaks
Linux x86_64, 64-bit, cc: "make" still breaks

SunOS sparc,  32-bit, gcc: ok
SunOS sparc,  64-bit, gcc: ok
SunOS x86,32-bit, gcc: ok
SunOS x86_64, 32-bit, gcc: ok
SunOS x86_64, 64-bit, gcc: ok
Linux x86,32-bit, gcc: ok
Linux x86_64, 32-bit, gcc: ok
Linux x86_64, 64-bit, gcc: ok

The problems on Solaris x86 and Solaris x86_64 could be solved using
Sun C 5.11 instead of Sun C 5.9. Unfortuantely I have still the same
problem on Linux x86 and Linux x86_64 with Sun C 5.11.

tyr openmpi-1.5-Linux.x86_64.32_cc 417 tail -15
   log.make.Linux.x86_64.32_cc
make[3]: Leaving directory `.../opal/libltdl'
make[2]: Leaving directory `.../opal/libltdl'
Making all in asm
make[2]: Entering directory `.../opal/asm'
   CC asm.lo
rm -f atomic-asm.S
ln -s "../../../openmpi-1.5/opal/asm/generated/atomic-ia32-linux-nongas.s"
   atomic-asm.S
   CPPAS  atomic-asm.lo
cc1: error: unrecognized command line option "-fno-directives-only"
cc: cpp failed for atomic-asm.S
make[2]: *** [atomic-asm.lo] Error 1
make[2]: Leaving directory `.../opal/asm'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `.../opal'
make: *** [all-recursive] Error 1
tyr openmpi-1.5-Linux.x86_64.32_cc 418

I can switch back to Sun C 5.9 on Solaris x86(_64) systems if somebody
is interested to solve the problem for the older compiler. I used the
following options:

../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_64_gcc \
   --libdir=/usr/local/openmpi-1.5_64_gcc/lib64 \
   LDFLAGS="-m64 -L/usr/local/gcc-4.2.0/lib/sparcv9" \
   CC="gcc" CPP="cpp" CXX="g++" CXXCPP="cpp" F77="gfortran" \
   CFLAGS="-m64" CXXFLAGS="-m64" FFLAGS="-m64" FCFLAGS="-m64" \
   CXXLDFLAGS="-m64" CPPFLAGS="" \
   C_INCL_PATH="" C_INCLUDE_PATH="" CPLUS_INCLUDE_PATH="" \
   OBJC_INCLUDE_PATH="" MPIHOME="" \
   --without-udapl --without-openib \
   --enable-mpi-f90 --with-mpi-f90-size=small \
   --enable-heterogeneous --enable-cxx-exceptions \
   --enable-shared --enable-orterun-prefix-by-default \
   --with-threads=posix --enable-mpi-threads --disable-progress-threads \
   |&  tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_gcc

For x86_64 I changed one line:

   LDFLAGS="-m64 -L/usr/local/gcc-4.2.0/lib/amd64" \


../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_64_cc \
   --libdir=/usr/local/openmpi-1.5_64_cc/lib64 \
   LDFLAGS="-m64" \
   CC="cc" CXX="CC" F77="f77" FC="f95" \
   CFLAGS="-m64" CXXFLAGS="-m64" FFLAGS="-m64" FCFLAGS="-m64" \
   CXXLDFLAGS="-m64" CPPFLAGS="" \
   C_INCL_PATH="" C_INCLUDE_PATH="" CPLUS_INCLUDE_PATH="" \
   OBJC_INCLUDE_PATH="" MPICHHOME="" \
   --without-udapl --without-openib \
   --enable-mpi-f90 --with-mpi-f90-size=small \
   --enable-heterogeneous --enable-cxx-exceptions \
   --enable-shared --enable-orterun-prefix-by-default \
   --with-threads=posix --enable-mpi-threads --disable-progress-threads \
   |&  tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc


For 32-bit systems I changed "-m64" to "-m32", didn't specify "-L..."
in LDFLAGS, and didn't use "--libdir=...".


Kind regards

Siegmar

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-15 Thread Terry Dontje
Sorry, I am still trying to grok all your email as what the problem you 
are trying to solve.  So is the issue is trying to have two jobs having 
processes on the same node be able to bind there processes on different 
resources.  Like core 1 for the first job and core 2 and 3 for the 2nd job?


--td

On 11/15/2010 09:29 AM, Chris Jewell wrote:

Hi,


If, indeed, it is not possible currently to implement this type of core-binding 
in tightly integrated OpenMPI/GE, then a solution might lie in a custom script 
run in the parallel environment's 'start proc args'. This script would have to 
find out which slots are allocated where on the cluster, and write an OpenMPI 
rankfile.

Exactly this should work.

If you use "binding_instance" "pe" and reformat the information in the $PE_HOSTFILE to a 
"rankfile", it should work to get the desired allocation. Maybe you can share the script with this 
list once you got it working.


As far as I can see, that's not going to work.  This is because, exactly like 
"binding_instance" "set", for -binding pe linear:n you get n cores bound per 
node.  This is easily verifiable by using a long job and examining the pe_hostfile.  For example, I 
submit a job with:

$ qsub -pe mpi 8 -binding pe linear:1 myScript.com

and my pe_hostfile looks like:

exec6.cluster.stats.local 2 batch.q@exec6.cluster.stats.local 0,1
exec1.cluster.stats.local 1 batch.q@exec1.cluster.stats.local 0,1
exec7.cluster.stats.local 1 batch.q@exec7.cluster.stats.local 0,1
exec5.cluster.stats.local 1 batch.q@exec5.cluster.stats.local 0,1
exec4.cluster.stats.local 1 batch.q@exec4.cluster.stats.local 0,1
exec3.cluster.stats.local 1 batch.q@exec3.cluster.stats.local 0,1
exec2.cluster.stats.local 1 batch.q@exec2.cluster.stats.local 0,1

Notice that, because I have specified the -binding pe linear:1, each execution 
node binds processes for the job_id to one core.  If I have -binding pe 
linear:2, I get:

exec6.cluster.stats.local 2 batch.q@exec6.cluster.stats.local 0,1:0,2
exec1.cluster.stats.local 1 batch.q@exec1.cluster.stats.local 0,1:0,2
exec7.cluster.stats.local 1 batch.q@exec7.cluster.stats.local 0,1:0,2
exec4.cluster.stats.local 1 batch.q@exec4.cluster.stats.local 0,1:0,2
exec3.cluster.stats.local 1 batch.q@exec3.cluster.stats.local 0,1:0,2
exec2.cluster.stats.local 1 batch.q@exec2.cluster.stats.local 0,1:0,2
exec5.cluster.stats.local 1 batch.q@exec5.cluster.stats.local 0,1:0,2

So the pe_hostfile still doesn't give an accurate representation of the binding 
allocation for use by OpenMPI.  Question: is there a system file or command that I could 
use to check which processors are "occupied"?

Chris

--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778






___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-15 Thread Terry Dontje

On 11/15/2010 11:08 AM, Chris Jewell wrote:

Sorry, I am still trying to grok all your email as what the problem you
are trying to solve. So is the issue is trying to have two jobs having
processes on the same node be able to bind there processes on different
resources. Like core 1 for the first job and core 2 and 3 for the 2nd job?

--td

That's exactly it.  Each MPI process needs to be bound to 1 processor in a way 
that reflects GE's slot allocation scheme.


I actually don't think that I got it.  So you give two cases:

Case 1:

$ qsub -pe mpi 8 -binding pe linear:1 myScript.com

and my pe_hostfile looks like:

exec6.cluster.stats.local 2batch.q@exec6.cluster.stats.local  0,1
exec1.cluster.stats.local 1batch.q@exec1.cluster.stats.local  0,1
exec7.cluster.stats.local 1batch.q@exec7.cluster.stats.local  0,1
exec5.cluster.stats.local 1batch.q@exec5.cluster.stats.local  0,1
exec4.cluster.stats.local 1batch.q@exec4.cluster.stats.local  0,1
exec3.cluster.stats.local 1batch.q@exec3.cluster.stats.local  0,1
exec2.cluster.stats.local 1batch.q@exec2.cluster.stats.local  0,1


Case 2:

Notice that, because I have specified the -binding pe linear:1, each execution 
node binds processes for the job_id to one core.  If I have -binding pe 
linear:2, I get:

exec6.cluster.stats.local 2batch.q@exec6.cluster.stats.local  0,1:0,2
exec1.cluster.stats.local 1batch.q@exec1.cluster.stats.local  0,1:0,2
exec7.cluster.stats.local 1batch.q@exec7.cluster.stats.local  0,1:0,2
exec4.cluster.stats.local 1batch.q@exec4.cluster.stats.local  0,1:0,2
exec3.cluster.stats.local 1batch.q@exec3.cluster.stats.local  0,1:0,2
exec2.cluster.stats.local 1batch.q@exec2.cluster.stats.local  0,1:0,2
exec5.cluster.stats.local 1batch.q@exec5.cluster.stats.local  0,1:0,2

Is your complaint really the fact that exec6 has been allocated two 
slots but there seems to only be one slot worth of resources allocated 
to it (ie in case one exec6 only has 1 core and case 2 it has two where 
maybe you'd expect 2 and 4 cores allocated respectively)?


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-15 Thread Terry Dontje

On 11/15/2010 02:11 PM, Reuti wrote:

Just to give my understanding of the problem:

Am 15.11.2010 um 19:57 schrieb Terry Dontje:


On 11/15/2010 11:08 AM, Chris Jewell wrote:

Sorry, I am still trying to grok all your email as what the problem you
are trying to solve. So is the issue is trying to have two jobs having
processes on the same node be able to bind there processes on different
resources. Like core 1 for the first job and core 2 and 3 for the 2nd job?

--td


That's exactly it.  Each MPI process needs to be bound to 1 processor in a way 
that reflects GE's slot allocation scheme.



I actually don't think that I got it.  So you give two cases:

Case 1:
$ qsub -pe mpi 8 -binding pe linear:1 myScript.com

and my pe_hostfile looks like:

exec6.cluster.stats.local 2
batch.q@exec6.cluster.stats.local
  0,1

Shouldn't here two cores be reserved for exec6 as it got two slots?



That's what I was wondering.

exec1.cluster.stats.local 1
batch.q@exec1.cluster.stats.local
  0,1
exec7.cluster.stats.local 1
batch.q@exec7.cluster.stats.local
  0,1
exec5.cluster.stats.local 1
batch.q@exec5.cluster.stats.local
  0,1
exec4.cluster.stats.local 1
batch.q@exec4.cluster.stats.local
  0,1
exec3.cluster.stats.local 1
batch.q@exec3.cluster.stats.local
  0,1
exec2.cluster.stats.local 1
batch.q@exec2.cluster.stats.local
  0,1


Case 2:
Notice that, because I have specified the -binding pe linear:1, each execution 
node binds processes for the job_id to one core.  If I have -binding pe 
linear:2, I get:

exec6.cluster.stats.local 2
batch.q@exec6.cluster.stats.local
  0,1:0,2
exec1.cluster.stats.local 1
batch.q@exec1.cluster.stats.local
  0,1:0,2
exec7.cluster.stats.local 1
batch.q@exec7.cluster.stats.local
  0,1:0,2
exec4.cluster.stats.local 1
batch.q@exec4.cluster.stats.local
  0,1:0,2
exec3.cluster.stats.local 1
batch.q@exec3.cluster.stats.local
  0,1:0,2
exec2.cluster.stats.local 1
batch.q@exec2.cluster.stats.local
  0,1:0,2
exec5.cluster.stats.local 1
batch.q@exec5.cluster.stats.local
  0,1:0,2

Is your complaint really the fact that exec6 has been allocated two slots but 
there seems to only be one slot worth of resources allocated

All are wrong except exec6. They should only get one core assigned.


Huh?  I would have thought exec6 would get 4 cores and the rest are correct.

--td


-- Reuti



to it (ie in case one exec6 only has 1 core and case 2 it has two where maybe 
you'd expect 2 and 4 cores allocated respectively)?

--

Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle - Performance Technologies
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>





Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Terry Dontje

On 11/16/2010 04:26 AM, Chris Jewell wrote:

Hi all,


On 11/15/2010 02:11 PM, Reuti wrote:

Just to give my understanding of the problem:

Sorry, I am still trying to grok all your email as what the problem you
are trying to solve. So is the issue is trying to have two jobs having
processes on the same node be able to bind there processes on different
resources. Like core 1 for the first job and core 2 and 3 for the 2nd job?

--td

You can't get 2 slots on a machine, as it's limited by the core count to one 
here, so such a slot allocation shouldn't occur at all.

So to clarify, the current -binding:  
allocates binding_amount cores to each sge_shepherd process associated with a job_id.  
There appears to be only one sge_shepherd process per job_id per execution node, so all 
child processes run on these allocated cores.  This is irrespective of the number of slots 
allocated to the node.

I believe the above is correct.

I agree with Reuti that the binding_amount parameter should be a maximum number 
of bound cores per node, with the actual number determined by the number of 
slots allocated per node.  FWIW, an alternative approach might be to have 
another binding_type ('slot', say) that automatically allocated one core per 
slot.

That might be correct, I've put in a question to someone who should know.

Of course, a complex situation might arise if a user submits a combined 
MPI/multithreaded job, but then I guess we're into the realm of setting 
allocation_rule.

Yes, that would get ugly.

Is it going to be worth looking at creating a patch for this?  I don't know 
much of the internals of SGE -- would it be hard work to do?  I've not that 
much time to dedicate towards it, but I could put some effort in if necessary...


Is the patch you're wanting is for a "slot" binding_type?

--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Terry Dontje

On 11/16/2010 09:08 AM, Reuti wrote:

Hi,

Am 16.11.2010 um 14:07 schrieb Ralph Castain:


Perhaps I'm missing it, but it seems to me that the real problem lies in the interaction 
between SGE and OMPI during OMPI's two-phase launch. The verbose output shows that SGE 
dutifully allocated the requested number of cores on each node. However, OMPI launches 
only one process on each node (the ORTE daemon), which SGE "binds" to a single 
core since that is what it was told to do.

Since SGE never sees the local MPI procs spawned by ORTE, it can't assign 
bindings to them. The ORTE daemon senses its local binding (i.e., to a single 
core in the allocation), and subsequently binds all its local procs to that 
core.

I believe all you need to do is tell SGE to:

1. allocate a specified number of cores on each node to your job

this is currently the bug in the "slot<=>  core" relation in SGE, which has to 
be removed, updated or clarified. For now slot and core count are out of sync AFAICS.


Technically this isn't a bug but a gap in the allocation rule.  I think 
the solution is a new allocation rule.

2. have SGE bind procs it launches to -all- of those cores. I believe SGE does 
this automatically to constrain the procs to running on only those cores.

This is another "bug/feature" in SGE: it's a matter of discussion, whether the shepherd 
should get exactly one core (in case you use more than one `qrsh`per node) for each call, or *all* 
cores assigned (which we need right now, as the processes in Open MPI will be forks of orte 
daemon). About such a situtation I filled an issue a long time ago and 
"limit_to_one_qrsh_per_host yes/no" in the PE definition would do (this setting should 
then also change the core allocation of the master process):

http://gridengine.sunsource.net/issues/show_bug.cgi?id=1254
Isn't it almost required to have the shepherd bind to all the cores so 
that the orted inherits that binding?



3. tell OMPI to --bind-to-core.

In other words, tell SGE to allocate a certain number of cores on each node, 
but to bind each proc to all of them (i.e., don't bind a proc to a specific 
core). I'm pretty sure that is a standard SGE option today (at least, I know it 
used to be). I don't believe any patch or devel work is required (to either SGE 
or OMPI).

When you use a fixed allocation_rule and a matching -binding request it will 
work today. But any other case won't be distributed in the correct way.

Ok, so what is the "correct" way and we sure it isn't distributed correctly?

In the original case of 7 nodes and processes if we do -binding pe 
linear:2, and add the -bind-to-core to mpirun  I'd actually expect 6 of 
the nodes processes bind to one core and the 7th node with 2 processes 
to have each of those processes bound to different cores on the same 
machine.


Can we get a full output of such a run with -report-bindings turned on.  
I think we should find out that things actually are happening correctly 
except for the fact that the 6 of the nodes have 2 cores allocated but 
only one is being bound to by a process.


--td


-- Reuti




On Tue, Nov 16, 2010 at 4:07 AM, Reuti  wrote:
Am 16.11.2010 um 10:26 schrieb Chris Jewell:


Hi all,


On 11/15/2010 02:11 PM, Reuti wrote:

Just to give my understanding of the problem:

Sorry, I am still trying to grok all your email as what the problem you
are trying to solve. So is the issue is trying to have two jobs having
processes on the same node be able to bind there processes on different
resources. Like core 1 for the first job and core 2 and 3 for the 2nd job?

--td

You can't get 2 slots on a machine, as it's limited by the core count to one 
here, so such a slot allocation shouldn't occur at all.

So to clarify, the current -binding:  
allocates binding_amount cores to each sge_shepherd process associated with a job_id.  
There appears to be only one sge_shepherd process per job_id per execution node, so all 
child processes run on these allocated cores.  This is irrespective of the number of slots 
allocated to the node.

I agree with Reuti that the binding_amount parameter should be a maximum number 
of bound cores per node, with the actual number determined by the number of 
slots allocated per node.  FWIW, an alternative approach might be to have 
another binding_type ('slot', say) that automatically allocated one core per 
slot.

Of course, a complex situation might arise if a user submits a combined 
MPI/multithreaded job, but then I guess we're into the realm of setting 
allocation_rule.

IIRC there was a discussion on the [GE users] list about it, to get an uniform 
distribution on all slave nodes for such jobs, as also e.g. $OMP_NUM_THREADS will be set 
to the same value for all slave nodes for hybrid jobs. Otherwise it would be necessary to 
adjust SGE to set this value in the "-builtin-" startup method automatically on 
all nodes to the local granted slots value. For now a fixed allocation rule of 1,2,4 or 
whatever must be used 

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Terry Dontje

On 11/16/2010 10:59 AM, Reuti wrote:

Am 16.11.2010 um 15:26 schrieb Terry Dontje:



1. allocate a specified number of cores on each node to your job


this is currently the bug in the "slot<=>  core" relation in SGE, which has to 
be removed, updated or clarified. For now slot and core count are out of sync AFAICS.


Technically this isn't a bug but a gap in the allocation rule.  I think the 
solution is a new allocation rule.

Yes, you can phrase it this way. But what do you mean by "new allocation rule"?
The proposal of have a slot allocation rule that forces the number of 
cores allocated on each node to equal the number of slots.

The slot allocation should follow the specified cores?

The other way around I think.



2. have SGE bind procs it launches to -all- of those cores. I believe SGE does 
this automatically to constrain the procs to running on only those cores.


This is another "bug/feature" in SGE: it's a matter of discussion, whether the shepherd 
should get exactly one core (in case you use more than one `qrsh`per node) for each call, or *all* 
cores assigned (which we need right now, as the processes in Open MPI will be forks of orte 
daemon). About such a situtation I filled an issue a long time ago and 
"limit_to_one_qrsh_per_host yes/no" in the PE definition would do (this setting should 
then also change the core allocation of the master process):


http://gridengine.sunsource.net/issues/show_bug.cgi?id=1254

Isn't it almost required to have the shepherd bind to all the cores so that the 
orted inherits that binding?

Yes, for orted. But if you want to have any other (legacy) application which 
using N times `qrsh` to an exechost when you got N slots thereon, then only one 
core should be bound to each of the started shepherds.

Blech.  Not sure of the solution for that but I see what you are saying 
now :-).

3. tell OMPI to --bind-to-core.

In other words, tell SGE to allocate a certain number of cores on each node, 
but to bind each proc to all of them (i.e., don't bind a proc to a specific 
core). I'm pretty sure that is a standard SGE option today (at least, I know it 
used to be). I don't believe any patch or devel work is required (to either SGE 
or OMPI).


When you use a fixed allocation_rule and a matching -binding request it will 
work today. But any other case won't be distributed in the correct way.


Ok, so what is the "correct" way and we sure it isn't distributed correctly?

You posted the two cases yesterday. Do we agree that both cases aren't correct, or do you think it's a 
correct allocation for both cases? Even if it could be "repaired" in Open MPI, it would be 
better to fix the generated 'pe' PE hostfile and 'set' allocation, i.e. the "slot<=>  
cores" relation.


So I am not a GE type of guy but from what I've been led to believe what 
happened is correct (in some form of correct).  That is in case one we 
asked for a core allocation of 1 core per node and a core allocation of 
2 cores in the other case.  That is what we were given.  The fact that 
we distributed the slots in a non-uniform manner I am not sure is GE's 
fault.  Note I can understand where it may seem non-intuitive and not 
nice for people wanting to do things like this.

In the original case of 7 nodes and processes if we do -binding pe linear:2, 
and add the -bind-to-core to mpirun  I'd actually expect 6 of the nodes 
processes bind to one core and the 7th node with 2 processes to have each of 
those processes bound to different cores on the same machine.

Yes, possibly it could be repaired this way (for now I have no free machines to play with). But 
then the "reserved" cores by the "-binding pe linear:2" are lost for other 
processes on these 6 nodes, and the slot count gets out of sync with slots.
Right, if you want to rightsize the amount of cores allocated to slots 
allocated on each node then we are stuck unless a new allocation rule is 
made.

Can we get a full output of such a run with -report-bindings turned on.  I 
think we should find out that things actually are happening correctly except 
for the fact that the 6 of the nodes have 2 cores allocated but only one is 
being bound to by a process.

You mean, to accept the current behavior as being the intended one, as finally 
for having only one job running on these machines we get what we asked for - 
despite the fact that cores are lost for other processes?

Yes, that is what I mean.  I first would like to prove at least to 
myself things are working the way we think they are.  I believe the 
discussion of recovering the lost cores is the next step.  Either we 
redefine what -binding linear:X means in light of slots, we make a new 
allocation rule -binding slots:X or live with the lost cores.  Note, the 
"we" here is loosely used.  I a

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Terry Dontje

On 11/16/2010 12:13 PM, Chris Jewell wrote:

On 16 Nov 2010, at 14:26, Terry Dontje wrote:

In the original case of 7 nodes and processes if we do -binding pe linear:2, 
and add the -bind-to-core to mpirun  I'd actually expect 6 of the nodes 
processes bind to one core and the 7th node with 2 processes to have each of 
those processes bound to different cores on the same machine.

Can we get a full output of such a run with -report-bindings turned on.  I 
think we should find out that things actually are happening correctly except 
for the fact that the 6 of the nodes have 2 cores allocated but only one is 
being bound to by a process.

Sure.   Here's the stderr of a job submitted to my cluster with 'qsub -pe mpi 8 
-binding linear:2 myScript.com'  where myScript.com runs 'mpirun -mca 
ras_gridengine_verbose 100 --report-bindings ./unterm':

[exec4:17384] System has detected external process binding to cores 0022
[exec4:17384] ras:gridengine: JOB_ID: 59352
[exec4:17384] ras:gridengine: PE_HOSTFILE: 
/usr/sge/default/spool/exec4/active_jobs/59352.1/pe_hostfile
[exec4:17384] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE shows 
slots=2
[exec4:17384] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec4:17384] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec4:17384] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec4:17384] ras:gridengine: exec6.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec4:17384] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec4:17384] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE shows 
slots=1


Is that all that came out?  I would have expected a some output from 
each process after the orted forked the processes but before the exec of 
unterm.


--td

Chris

--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778






___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>





Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Terry Dontje

On 11/16/2010 12:13 PM, Chris Jewell wrote:

On 16 Nov 2010, at 14:26, Terry Dontje wrote:

In the original case of 7 nodes and processes if we do -binding pe linear:2, 
and add the -bind-to-core to mpirun  I'd actually expect 6 of the nodes 
processes bind to one core and the 7th node with 2 processes to have each of 
those processes bound to different cores on the same machine.

Can we get a full output of such a run with -report-bindings turned on.  I 
think we should find out that things actually are happening correctly except 
for the fact that the 6 of the nodes have 2 cores allocated but only one is 
being bound to by a process.

Sure.   Here's the stderr of a job submitted to my cluster with 'qsub -pe mpi 8 
-binding linear:2 myScript.com'  where myScript.com runs 'mpirun -mca 
ras_gridengine_verbose 100 --report-bindings ./unterm':

[exec4:17384] System has detected external process binding to cores 0022
[exec4:17384] ras:gridengine: JOB_ID: 59352
[exec4:17384] ras:gridengine: PE_HOSTFILE: 
/usr/sge/default/spool/exec4/active_jobs/59352.1/pe_hostfile
[exec4:17384] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE shows 
slots=2
[exec4:17384] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec4:17384] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec4:17384] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec4:17384] ras:gridengine: exec6.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec4:17384] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec4:17384] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE shows 
slots=1


Is that all that came out?  I would have expected a some output from 
each process after the orted forked the processes but before the exec of 
unterm.


--td

Chris

--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778






___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>





Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Terry Dontje

On 11/16/2010 01:31 PM, Reuti wrote:

Hi Ralph,

Am 16.11.2010 um 15:40 schrieb Ralph Castain:


2. have SGE bind procs it launches to -all- of those cores. I believe SGE does 
this automatically to constrain the procs to running on only those cores.

This is another "bug/feature" in SGE: it's a matter of discussion, whether the shepherd 
should get exactly one core (in case you use more than one `qrsh`per node) for each call, or *all* 
cores assigned (which we need right now, as the processes in Open MPI will be forks of orte 
daemon). About such a situtation I filled an issue a long time ago and 
"limit_to_one_qrsh_per_host yes/no" in the PE definition would do (this setting should 
then also change the core allocation of the master process):

http://gridengine.sunsource.net/issues/show_bug.cgi?id=1254

I believe this is indeed the crux of the issue

fantastic to share the same view.


FWIW, I think I agree too.

3. tell OMPI to --bind-to-core.

In other words, tell SGE to allocate a certain number of cores on each node, 
but to bind each proc to all of them (i.e., don't bind a proc to a specific 
core). I'm pretty sure that is a standard SGE option today (at least, I know it 
used to be). I don't believe any patch or devel work is required (to either SGE 
or OMPI).

When you use a fixed allocation_rule and a matching -binding request it will 
work today. But any other case won't be distributed in the correct way.

Is it possible to not include the -binding request? If SGE is told to use a 
fixed allocation_rule, and to allocate (for example) 2 cores/node, then won't 
the orted see
itself bound to two specific cores on each node?

When you leave out the -binding, all jobs are allowed to run on any core.



We would then be okay as the spawned children of orted would inherit its 
binding. Just don't tell mpirun to bind the processes and the threads of those 
MPI procs will be able to operate across the provided cores.

Or does SGE only allocate 2 cores/node in that case (i.e., allocate, but no 
-binding given), but doesn't bind the orted to any two specific cores? If so, 
then that would be a problem as the orted would think itself unconstrained. If 
I understand the thread correctly, you're saying that this is what happens 
today - true?

Exactly. It won't apply any binding at all and orted would think of being 
unlimited. I.e. limited only by the number of slots it should use thereon.

So I guess the question I have for Ralph.  I thought, and this might be 
mixing some of the ideas Jeff and I've been talking about, that when a 
RM executes the orted with a bound set of resources (ie cores) that 
orted would bind the individual processes on a subset of the bounded 
resources.  Is this not really the case for 1.4.X branch?  I believe it 
is the case for the trunk based on Jeff's refactoring.


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Terry Dontje

On 11/16/2010 08:24 PM, Ralph Castain wrote:



On Tue, Nov 16, 2010 at 12:23 PM, Terry Dontje 
mailto:terry.don...@oracle.com>> wrote:


On 11/16/2010 01:31 PM, Reuti wrote:

Hi Ralph,

Am 16.11.2010 um 15:40 schrieb Ralph Castain:


2. have SGE bind procs it launches to -all- of those cores. I believe SGE 
does this automatically to constrain the procs to running on only those cores.

This is another "bug/feature" in SGE: it's a matter of discussion, whether the 
shepherd should get exactly one core (in case you use more than one `qrsh`per node) for each call, 
or *all* cores assigned (which we need right now, as the processes in Open MPI will be forks of 
orte daemon). About such a situtation I filled an issue a long time ago and 
"limit_to_one_qrsh_per_host yes/no" in the PE definition would do (this setting should 
then also change the core allocation of the master process):

http://gridengine.sunsource.net/issues/show_bug.cgi?id=1254

I believe this is indeed the crux of the issue

fantastic to share the same view.


FWIW, I think I agree too.


3. tell OMPI to --bind-to-core.

In other words, tell SGE to allocate a certain number of cores on each 
node, but to bind each proc to all of them (i.e., don't bind a proc to a 
specific core). I'm pretty sure that is a standard SGE option today (at least, 
I know it used to be). I don't believe any patch or devel work is required (to 
either SGE or OMPI).

When you use a fixed allocation_rule and a matching -binding request it 
will work today. But any other case won't be distributed in the correct way.

Is it possible to not include the -binding request? If SGE is told to use a 
fixed allocation_rule, and to allocate (for example) 2 cores/node, then won't 
the orted see
itself bound to two specific cores on each node?

When you leave out the -binding, all jobs are allowed to run on any core.



We would then be okay as the spawned children of orted would inherit its 
binding. Just don't tell mpirun to bind the processes and the threads of those 
MPI procs will be able to operate across the provided cores.

Or does SGE only allocate 2 cores/node in that case (i.e., allocate, but no 
-binding given), but doesn't bind the orted to any two specific cores? If so, 
then that would be a problem as the orted would think itself unconstrained. If 
I understand the thread correctly, you're saying that this is what happens 
today - true?

Exactly. It won't apply any binding at all and orted would think of being 
unlimited. I.e. limited only by the number of slots it should use thereon.


So I guess the question I have for Ralph.  I thought, and this
might be mixing some of the ideas Jeff and I've been talking
about, that when a RM executes the orted with a bound set of
resources (ie cores) that orted would bind the individual
processes on a subset of the bounded resources.  Is this not
really the case for 1.4.X branch?  I believe it is the case for
the trunk based on Jeff's refactoring.


You are absolutely correct, Terry, and the 1.4 release series does 
include the proper code. The point here, though, is that SGE binds the 
orted to a single core, even though other cores are also allocated. So 
the orted detects an external binding of one core, and binds all its 
children to that same core.
I do not think you are right here.  Chris sent the following which looks 
like OGE (fka SGE) actually did bind the hnp to multiple cores.  However 
that message I believe is not coming from the processes themselves and 
actually is only shown by the hnp.  I wonder if Chris adds a 
"-bind-to-core" option  we'll see more output from the a.out's before 
they exec unterm?


Sure.   Here's the stderr of a job submitted to my cluster with 'qsub -pe mpi 8 
-binding linear:2 myScript.com'  where myScript.com runs 'mpirun -mca 
ras_gridengine_verbose 100 --report-bindings ./unterm':
 
 [exec4:17384] System has detected external process binding to cores 0022

 [exec4:17384] ras:gridengine: JOB_ID: 59352
 [exec4:17384] ras:gridengine: PE_HOSTFILE: 
/usr/sge/default/spool/exec4/active_jobs/59352.1/pe_hostfile
 [exec4:17384] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE shows 
slots=2
 [exec4:17384] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE shows 
slots=1
 [exec4:17384] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE shows 
slots=1
 [exec4:17384] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE shows 
slots=1
 [exec4:17384] ras:gridengine: exec6.cluster.stats.local: PE_HOSTFILE shows 
slots=1
 [exec4:17384] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE shows 
slots=1
 [exec4:17384] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE shows 
slots=1




--td
What I had suggested to Reuti was to not include the -bindi

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Terry Dontje

On 11/17/2010 07:41 AM, Chris Jewell wrote:

On 17 Nov 2010, at 11:56, Terry Dontje wrote:

You are absolutely correct, Terry, and the 1.4 release series does include the 
proper code. The point here, though, is that SGE binds the orted to a single 
core, even though other cores are also allocated. So the orted detects an 
external binding of one core, and binds all its children to that same core.

I do not think you are right here.  Chris sent the following which looks like OGE (fka 
SGE) actually did bind the hnp to multiple cores.  However that message I believe is not 
coming from the processes themselves and actually is only shown by the hnp.  I wonder if 
Chris adds a "-bind-to-core" option  we'll see more output from the a.out's 
before they exec unterm?

As requested using

$ qsub -pe mpi 8 -binding linear:2 myScript.com'

and

'mpirun -mca ras_gridengine_verbose 100 --report-bindings -by-core 
-bind-to-core ./unterm'

[exec5:06671] System has detected external process binding to cores 0028
[exec5:06671] ras:gridengine: JOB_ID: 59434
[exec5:06671] ras:gridengine: PE_HOSTFILE: 
/usr/sge/default/spool/exec5/active_jobs/59434.1/pe_hostfile
[exec5:06671] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE shows 
slots=2
[exec5:06671] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE shows 
slots=2
[exec5:06671] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec5:06671] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec5:06671] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec5:06671] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE shows 
slots=1

No more info.  I note that the external binding is slightly different to what I 
had before, but our cluster is busier today :-)


I would have expected more output.

--td

Chris


--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778






___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>





Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Terry Dontje

On 11/17/2010 09:32 AM, Ralph Castain wrote:
Cris' output is coming solely from the HNP, which is correct given the 
way things were executed. My comment was from another email where he 
did what I asked, which was to include the flags:


--report-bindings --leave-session-attached

so we could see the output from each orted. In that email, it was 
clear that while mpirun was bound to multiple cores, the orteds are 
being bound to a -single- core.


Hence the problem.

Hmm, I see Ralph's comment on 11/15 but I don't see any output that 
shows what Ralph say's above.  The only report-bindings output I see is 
when he runs without OGE binding.   Can someone give me the date and 
time of Chris' email with the --report-bindings and 
--leave-session-attached.  Or a rerun of the below with the 
--leave-session-attached option would also help.


I find it confusing that --leave-session-attached is not required when 
the OGE binding argument is not given.


--td

HTH
Ralph


On Wed, Nov 17, 2010 at 6:57 AM, Terry Dontje <mailto:terry.don...@oracle.com>> wrote:


On 11/17/2010 07:41 AM, Chris Jewell wrote:

    On 17 Nov 2010, at 11:56, Terry Dontje wrote:

You are absolutely correct, Terry, and the 1.4 release series does include 
the proper code. The point here, though, is that SGE binds the orted to a 
single core, even though other cores are also allocated. So the orted detects 
an external binding of one core, and binds all its children to that same core.

I do not think you are right here.  Chris sent the following which looks like OGE 
(fka SGE) actually did bind the hnp to multiple cores.  However that message I believe is 
not coming from the processes themselves and actually is only shown by the hnp.  I wonder 
if Chris adds a "-bind-to-core" option  we'll see more output from the a.out's 
before they exec unterm?

As requested using

$ qsub -pe mpi 8 -binding linear:2 myScript.com'

and

'mpirun -mca ras_gridengine_verbose 100 --report-bindings -by-core 
-bind-to-core ./unterm'

[exec5:06671] System has detected external process binding to cores 0028
[exec5:06671] ras:gridengine: JOB_ID: 59434
[exec5:06671] ras:gridengine: PE_HOSTFILE: 
/usr/sge/default/spool/exec5/active_jobs/59434.1/pe_hostfile
[exec5:06671] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE shows 
slots=2
[exec5:06671] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE shows 
slots=2
[exec5:06671] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec5:06671] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec5:06671] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec5:06671] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE shows 
slots=1

No more info.  I note that the external binding is slightly different to 
what I had before, but our cluster is busier today :-)


I would have expected more output.

--td


Chris


--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778






___
users mailing list
us...@open-mpi.org  <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users



-- 
Oracle

Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>




___
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>





Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Terry Dontje

On 11/17/2010 10:00 AM, Ralph Castain wrote:
--leave-session-attached is always required if you want to see output 
from the daemons. Otherwise, the launcher closes the ssh session (or 
qrsh session, in this case) as part of its normal operating procedure, 
thus terminating the stdout/err channel.



I believe you but isn't it weird that without the --binding option to 
qsub we saw -report-bindings output from the orteds?


Do you have the date of the email that has the info you talked about 
below.  I really am not trying to be an a-hole about this but there have 
been so much data and email flying around it would be nice to actually 
see the output you mention.


--td

On Wed, Nov 17, 2010 at 7:51 AM, Terry Dontje <mailto:terry.don...@oracle.com>> wrote:


On 11/17/2010 09:32 AM, Ralph Castain wrote:

Cris' output is coming solely from the HNP, which is correct
given the way things were executed. My comment was from another
email where he did what I asked, which was to include the flags:

--report-bindings --leave-session-attached

so we could see the output from each orted. In that email, it was
clear that while mpirun was bound to multiple cores, the orteds
are being bound to a -single- core.

Hence the problem.


Hmm, I see Ralph's comment on 11/15 but I don't see any output
that shows what Ralph say's above.  The only report-bindings
output I see is when he runs without OGE binding.   Can someone
give me the date and time of Chris' email with the
--report-bindings and --leave-session-attached.  Or a rerun of the
below with the --leave-session-attached option would also help.

I find it confusing that --leave-session-attached is not required
when the OGE binding argument is not given.

--td


HTH
Ralph


    On Wed, Nov 17, 2010 at 6:57 AM, Terry Dontje
mailto:terry.don...@oracle.com>> wrote:

On 11/17/2010 07:41 AM, Chris Jewell wrote:

    On 17 Nov 2010, at 11:56, Terry Dontje wrote:

You are absolutely correct, Terry, and the 1.4 release series does 
include the proper code. The point here, though, is that SGE binds the orted to 
a single core, even though other cores are also allocated. So the orted detects 
an external binding of one core, and binds all its children to that same core.

I do not think you are right here.  Chris sent the following which looks like OGE 
(fka SGE) actually did bind the hnp to multiple cores.  However that message I believe is 
not coming from the processes themselves and actually is only shown by the hnp.  I wonder 
if Chris adds a "-bind-to-core" option  we'll see more output from the a.out's 
before they exec unterm?

As requested using

$ qsub -pe mpi 8 -binding linear:2 myScript.com'

and

'mpirun -mca ras_gridengine_verbose 100 --report-bindings -by-core 
-bind-to-core ./unterm'

[exec5:06671] System has detected external process binding to cores 0028
[exec5:06671] ras:gridengine: JOB_ID: 59434
[exec5:06671] ras:gridengine: PE_HOSTFILE: 
/usr/sge/default/spool/exec5/active_jobs/59434.1/pe_hostfile
[exec5:06671] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE 
shows slots=2
[exec5:06671] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE 
shows slots=2
[exec5:06671] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE 
shows slots=1
[exec5:06671] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE 
shows slots=1
[exec5:06671] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE 
shows slots=1
[exec5:06671] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE 
shows slots=1

No more info.  I note that the external binding is slightly different 
to what I had before, but our cluster is busier today :-)


I would have expected more output.

--td


Chris


--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778






___
users mailing list
us...@open-mpi.org  <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users



-- 
Oracle

Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>




___
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us..

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Terry Dontje

On 11/17/2010 10:48 AM, Ralph Castain wrote:
No problem at all. I confess that I am lost in all the sometimes 
disjointed emails in this thread. Frankly, now that I search, I can't 
find it either! :-(


I see one email that clearly shows the external binding report from 
mpirun, but not from any daemons. I see another email (after you asked 
if there was all the output) that states "yep", indicating that was 
all the output, and then proceeds to offer additional output that 
wasn't in the original email you asked about!


So I am now as thoroughly confused as you are...

That said, I am confident in the code in ORTE as it has worked 
correctly when I tested it against external bindings in other 
environments. So I really do believe this is an OGE issue where the 
orted isn't getting correctly bound against all allocated cores.


I am confused by your statement above because we don't even know what is 
being bound or not.  We know that in it looks like the hnp is bound to 2 
cores which is what we asked for but we don't know what any of the 
processes themselves are bound to.   So I personally cannot point to 
ORTE or OGE as the culprit because I don't think we know whether there 
is an issue.


So, until we are able to get the -report-bindings output from the a.out 
code (note I did not say orted) it is kind of hard to claim there is 
even an issue.  Which brings me back to the output question.  After some 
thinking the --report-bindings output I am expecting is not from the 
orted itself but from the a.out before it executes the user code.   
Which now makes me wonder if there is some odd OGE/OMPI integration 
issue which the -bind-to-code -report-bindings options are not being 
propagated/recognized/honored when qsub is given the -binding option.


Perhaps if someone could run this test again with --report-bindings 
--leave-session-attached and provide -all- output we could verify that 
analysis and clear up the confusion?



Yeah, however I bet you we still won't see output.

--td



On Wed, Nov 17, 2010 at 8:13 AM, Terry Dontje <mailto:terry.don...@oracle.com>> wrote:


On 11/17/2010 10:00 AM, Ralph Castain wrote:

--leave-session-attached is always required if you want to see
output from the daemons. Otherwise, the launcher closes the ssh
session (or qrsh session, in this case) as part of its normal
operating procedure, thus terminating the stdout/err channel.



I believe you but isn't it weird that without the --binding option
to qsub we saw -report-bindings output from the orteds?

Do you have the date of the email that has the info you talked
about below.  I really am not trying to be an a-hole about this
but there have been so much data and email flying around it would
be nice to actually see the output you mention.

    --td



On Wed, Nov 17, 2010 at 7:51 AM, Terry Dontje
mailto:terry.don...@oracle.com>> wrote:

On 11/17/2010 09:32 AM, Ralph Castain wrote:

Cris' output is coming solely from the HNP, which is correct
given the way things were executed. My comment was from
another email where he did what I asked, which was to
include the flags:

--report-bindings --leave-session-attached

so we could see the output from each orted. In that email,
it was clear that while mpirun was bound to multiple cores,
the orteds are being bound to a -single- core.

Hence the problem.


Hmm, I see Ralph's comment on 11/15 but I don't see any
output that shows what Ralph say's above.  The only
report-bindings output I see is when he runs without OGE
binding.   Can someone give me the date and time of Chris'
email with the --report-bindings and
--leave-session-attached.  Or a rerun of the below with the
--leave-session-attached option would also help.

I find it confusing that --leave-session-attached is not
required when the OGE binding argument is not given.

    --td


HTH
Ralph


On Wed, Nov 17, 2010 at 6:57 AM, Terry Dontje
mailto:terry.don...@oracle.com>>
wrote:

    On 11/17/2010 07:41 AM, Chris Jewell wrote:

On 17 Nov 2010, at 11:56, Terry Dontje wrote:

You are absolutely correct, Terry, and the 1.4 release series does 
include the proper code. The point here, though, is that SGE binds the orted to 
a single core, even though other cores are also allocated. So the orted detects 
an external binding of one core, and binds all its children to that same core.

I do not think you are right here.  Chris sent the following which looks like 
OGE (fka SGE) actually did bind the hnp to multiple cores.  However that message I 
believe is not coming from the processes themselves and actually is only shown by the 
hnp.  I wonder 

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-18 Thread Terry Dontje
Yes, I believe this solves the mystery.  In short OGE and ORTE both 
work.  In the linear:1 case the job is exiting because there are not 
enough resources for the orte binding to work, which actually makes 
sense.  In the linear:2 case I think we've proven that we are binding to 
the right amount of resources and to the correct physical resources at 
the process level.


In the case you do not do pass bind-to-core to mpirun with a qsub using  
linear:2 the processes on the same node will actually bind to the same 
two cores.  The only way to determine this is to run something that 
prints out the binding from the system.  There is no way to do this via 
OMPI because it only reports binding when you are requesting mpirun to 
do some type of binding (like -bind-to-core or -bind-to-socket.


In the linear:1 case with no binding I think you are having the 
processes on the same node run on the same core.   Which is exactly what 
you are asking for I believe.


So I believe we understand what is going on with the binding and it 
makes sense to me.  As far as the allocation issue of slots vs. cores 
and trying to not overallocate cores I believe the new allocation rule 
make sense to do but I'll let you hash that out with Daniel.


In summary I don't believe there is any OMPI bugs related to what we've 
seen and the OGE issue is just the allocation issue, right?


--td


On 11/18/2010 01:32 AM, Chris Jewell wrote:

Perhaps if someone could run this test again with --report-bindings 
--leave-session-attached and provide -all- output we could verify that analysis 
and clear up the confusion?


Yeah, however I bet you we still won't see output.

Actually, it seems we do get more output!  Results of 'qsub -pe mpi 8 -binding 
linear:2 myScript.com'

with

'mpirun -mca ras_gridengine_verbose 100 -report-bindings 
--leave-session-attached -bycore -bind-to-core ./unterm'

[exec1:06504] System has detected external process binding to cores 0028
[exec1:06504] ras:gridengine: JOB_ID: 59467
[exec1:06504] ras:gridengine: PE_HOSTFILE: 
/usr/sge/default/spool/exec1/active_jobs/59467.1/pe_hostfile
[exec1:06504] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE shows 
slots=2
[exec1:06504] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06504] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06504] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06504] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06504] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06504] ras:gridengine: exec6.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06504] [[59608,0],0] odls:default:fork binding child [[59608,1],0] to 
cpus 0008
[exec1:06504] [[59608,0],0] odls:default:fork binding child [[59608,1],1] to 
cpus 0020
[exec3:20248] [[59608,0],1] odls:default:fork binding child [[59608,1],2] to 
cpus 0008
[exec4:26792] [[59608,0],4] odls:default:fork binding child [[59608,1],5] to 
cpus 0001
[exec2:32462] [[59608,0],2] odls:default:fork binding child [[59608,1],3] to 
cpus 0001
[exec7:09833] [[59608,0],3] odls:default:fork binding child [[59608,1],4] to 
cpus 0002
[exec5:10834] [[59608,0],5] odls:default:fork binding child [[59608,1],6] to 
cpus 0001
[exec6:04230] [[59608,0],6] odls:default:fork binding child [[59608,1],7] to 
cpus 0001

AHHA!  Now I get the following if I use 'qsub -pe mpi 8 -binding linear:1 
myScript.com' with the above mpirun command:

[exec1:06552] System has detected external process binding to cores 0020
[exec1:06552] ras:gridengine: JOB_ID: 59468
[exec1:06552] ras:gridengine: PE_HOSTFILE: 
/usr/sge/default/spool/exec1/active_jobs/59468.1/pe_hostfile
[exec1:06552] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE shows 
slots=2
[exec1:06552] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06552] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06552] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06552] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06552] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06552] ras:gridengine: exec6.cluster.stats.local: PE_HOSTFILE shows 
slots=1
--
mpirun was unable to start the specified application as it encountered an error:

Error name: Unknown error: 1
Node: exec1

when attempting to start process rank 0.
--
[exec1:06552] [[59432,0],0] odls:default:fork binding child [[59432,1],0] to 
cpus 0020
--
Not enough processors were found on the local host to meet the requested
binding action:

   Local host:exec1
   Action requested:  bind-to-core
   Application name:  ./unterm

Re: [OMPI users] Multiple Subnet MPI Fail

2010-11-22 Thread Terry Dontje
You're gonna have to use a protocol that can route through a machine, 
OFED User Verbs (ie openib) does not do this.  The only way I know of to 
do this via OMPI is with the tcp btl.


--td

On 11/22/2010 09:28 AM, Paul Monday (Parallel Scientific) wrote:
We've been using OpenMPI in a switched environment with success, but 
we've moved to a point to point environment to do some work.  Some of 
the nodes cannot talk directly to one another, sort of like this with 
computers A,B, C with A having two ports:


A(1)(opensm)-->B
A(2)(opensm)-->C

B is not connected to C in any way.

When we try to run our OpenMPI program we are receiving:
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[1581,1],5]) is on host: pg-B
  Process 2 ([[1581,1],0]) is on host: pg-C
  BTLs attempted: openib self sm

Your MPI job is now going to abort; sorry.


I hope I'm not being overly naive but, is their a way to join the 
subnets at the MPI layer?  It seems like IP over IB would be too high 
up the stack.


Paul Monday
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] Prioritization of --mca btl openib,tcp,self

2010-11-23 Thread Terry Dontje

On 11/22/2010 08:18 PM, Paul Monday (Parallel Scientific) wrote:

This is a follow-up to an earlier question, I'm trying to understand how --mca 
btl prioritizes it's choice for connectivity.  Going back to my original 
network, there are actually two networks running around.  A point to point 
Infiniband network that looks like this (with two fabrics):

A(port 1)(opensm)-->B
A(port 2)(opensm)-->C

The original question queried whether there was a way to avoid the problem of B 
and C not being able to talk to each other if I were to run

mpirun  -host A,B,C --mca btl openib,self -d /mnt/shared/apps/myapp

"At least one pair of MPI processes are unable to reach each other for
MPI communications." ...

There is an additional network though, I have an ethernet management network 
that connects to all nodes.  If our program retrieves the ranks from the nodes 
using TCP and then can shift to openib, that would be interesting and, in fact, 
if I run

mpirun  -host A,B,C --mca btl openib,tcp,self -d /mnt/shared/apps/myapp

The program does, in fact, run cleanly.

But, the question I have now is does MPI "choose" to use tcp when it can find 
all nodes and then always use tcp, or will it fall back to openib if it can?
For MPI communications (as opposed to the ORTE communications) the 
library will try and pick out the most performant protocol to use for 
communications between two nodes.  So in your case A-B and A-C should 
use the openib btl and B-C should use the tcp btl.

So ... more succinctly:
Given a list of btls, such as openib,tcp,self, and a program can only broadcast 
on tcp but individual operations can occur over openib between nodes, will 
mpirun use the first interconnect that works for each operation or once it 
finds one that the broadcast phase works on will it use that one permanently?
If by broadcast you mean MPI_Bcast, this is actually done using point to 
point algorithms so the communications will happen over a mixture of IB 
and TCP.


If you mean something else by broadcast you'll need to clarify what you 
mean because there really isn't a direct use of protocol broadcasts in 
MPI or even ORTE to my knowledge.

And, as a follow-up, can I turn off the attempt to broadcast to touch all nodes?

See above.

Paul Monday
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] cannot build Open MPI 1.5 on Linux x86_64 with Oracle/Sun C 5.11

2010-11-29 Thread Terry Dontje
This is ticket 2632 https://svn.open-mpi.org/trac/ompi/ticket/2632.  A 
fix has been put into the trunk last week.  We should be able to CMR 
this fix to the 1.5 and 1.4 branches later this week.The ticket 
actually has a workaround for 1.5 branch.


--td
On 11/29/2010 09:46 AM, Siegmar Gross wrote:

Hi,

in the meantime we have installed gcc-4.5.1 and now I get a different error,
when I try to build OpenMPI-1.5 with Oracle Studio 12 Update 2 on Linux.

linpc4 openmpi-1.5-Linux.x86_64.32_cc_gcc-4.5.1 121 head -18 config.log
...
   $ ../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_32_cc
LDFLAGS=-m32 CC=cc CXX=CC F77=f77 FC=f95 CFLAGS=-m32 CXXFLAGS=-m32
FFLAGS=-m32 FCFLAGS=-m32 CXXLDFLAGS=-m32 CPPFLAGS= C_INCL_PATH=
C_INCLUDE_PATH= CPLUS_INCLUDE_PATH= OBJC_INCLUDE_PATH= MPICHHOME=
--without-udapl --without-openib --enable-mpi-f90
--with-mpi-f90-size=small --enable-heterogeneous
--enable-cxx-exceptions --enable-shared
--enable-orterun-prefix-by-default --with-threads=posix
--enable-mpi-threads --disable-progress-threads

## - ##
## Platform. ##
## - ##

hostname = linpc4
uname -m = x86_64
uname -r = 2.6.31.14-0.4-desktop
uname -s = Linux
uname -v = #1 SMP PREEMPT 2010-10-25 08:45:30 +0200



linpc4 openmpi-1.5-Linux.x86_64.32_cc_gcc-4.5.1 122 tail -20
   log.make.Linux.x86_64.32_cc
../../../../openmpi-1.5/ompi/mpi/f90/scripts/mpi_wtick_f90.f90.sh
/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc_gcc-4.5.1/ompi/mpi/f90>
mpi_wtick_f90.f90
   FC mpi_wtick_f90.lo
../../../../openmpi-1.5/ompi/mpi/f90/scripts/mpi_wtime_f90.f90.sh
/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc_gcc-4.5.1/ompi/mpi/f90>
mpi_wtime_f90.f90
   FC mpi_wtime_f90.lo
   FCLD   libmpi_f90.la
f90: Warning: Option -path passed to ld, if ld is invoked, ignored otherwise
f90: Warning: Option -path passed to ld, if ld is invoked, ignored otherwise
f90: Warning: Option -path passed to ld, if ld is invoked, ignored otherwise
f90: Warning: Option -soname passed to ld, if ld is invoked, ignored otherwise
/usr/bin/ld: unrecognized option '-path'
/usr/bin/ld: use the --help option for usage information
make[4]: *** [libmpi_f90.la] Error 2
make[4]: Leaving directory
`/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc_gcc-4.5.1/ompi/mpi/f90'
make[3]: *** [all-recursive] Error 1
make[3]: Leaving directory
`/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc_gcc-4.5.1/ompi/mpi/f90'
make[2]: *** [all] Error 2
make[2]: Leaving directory
`/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc_gcc-4.5.1/ompi/mpi/f90'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory
`/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc_gcc-4.5.1/ompi'
make: *** [all-recursive] Error 1
linpc4 openmpi-1.5-Linux.x86_64.32_cc_gcc-4.5.1 123


In my opinion it is still a strange behaviour that building OpenMPI with
"cc" depends on the installed version of "gcc". Has anybody successfully
build OpenMPI-1.5 with Oracle Studio C on Linux? Which command line
options did you use? I get the same error if I try to build a 64-bit
version. I can build and install OpenMPI-1.5 in a 32- and 64-bit version
without Fortran support, if I replace
"--enable-mpi-f90 --with-mpi-f90-size=small" with
"--disable-mpi-f77 --disable-mpi-f90" in the above "configure"-command.

"make check" delivers "PASSED" for all tests in the 64-bit and one
"FAILED" in the 32-bit version.

...
make  check-TESTS
make[3]: Entering directory
`/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc_gcc-4.5.1_without_f90/te
st/util'
  Failure : Mismatch: input "/home/fd1026", expected:1 got:0

SUPPORT: OMPI Test failed: opal_path_nfs() (1 of 13 failed)
FAIL: opal_path_nfs

1 of 1 test failed
Please report to http://www.open-mpi.org/community/help/

make[3]: *** [check-TESTS] Error 1
make[3]: Leaving directory
`/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc_gc
c-4.5.1_without_f90/test/util'
make[2]: *** [check-am] Error 2
make[2]: Leaving directory
`/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc_gc
c-4.5.1_without_f90/test/util'
make[1]: *** [check-recursive] Error 1
make[1]: Leaving directory
`/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc_gc
c-4.5.1_without_f90/test'
make: *** [check-recursive] Error 1
linpc4 openmpi-1.5-Linux.x86_64.32_cc_gcc-4.5.1_without_f90 131



I can also successfully build and run my small C test programs which I
mentioned in my earlier email with this OpenMPI package. Any ideas how
I can build Fortran support? Thank you very much for any suggestions in
advance.


Kind regards

Siegmar



   Sorry, but can you give us the config line, the config.log and the
full output of make preferrably with make V=1?

--td
On 10/29/2010 04:30 AM, Siegmar Gross wrote:

Hi,

I tried to build Open MPI 1.5 on Solaris X86 and x86_64 with Oracle
Studio 12.2. I can compile Open MP

Re: [OMPI users] [Rocks-Discuss] compiling Openmpi on solaris studio express

2010-11-30 Thread Terry Dontje

On 11/29/2010 05:41 PM, Nehemiah Dacres wrote:

thanks.
FYI: its openmpi-1.4.2 from a tarball like you assume
I changed this line
 *Sun\ F* | *Sun*Fortran*)
  # Sun Fortran 8.3 passes all unrecognized flags to the linker
  _LT_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
  _LT_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
  _LT_TAGVAR(lt_prog_compiler_wl, $1)='-Qoption ld '

 unfortunately my autoconf tool is out of date (2.59 , it says it 
wants 2.60+ )


The build page (http://www.open-mpi.org/svn/building.php) show's the 
versions of the tools you need to build OMPI.  Sorry, unfortunately in 
order for this workaround to work you need to re-autogen.sh no way 
around that.


On Mon, Nov 29, 2010 at 4:11 PM, Rolf vandeVaart 
mailto:rolf.vandeva...@oracle.com>> wrote:


No, I do not believe so.  First, I assume you are trying to build
either 1.4 or 1.5, not the trunk.
Secondly, I assume you are building from a tarfile that you have
downloaded.  Assuming these
two things are true, then (as stated in the bug report), prior to
running configure, you want to
make the following edits to config/libtool.m4 in all the places
you see it. ( I think just one place)

FROM:

 *Sun\ F*)
   # Sun Fortran 8.3 passes all unrecognized flags to the linker
   _LT_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
   _LT_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
   _LT_TAGVAR(lt_prog_compiler_wl, $1)=''
   ;;

TO:

 *Sun\ F*)
   # Sun Fortran 8.3 passes all unrecognized flags to the linker
   _LT_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
   _LT_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
   _LT_TAGVAR(lt_prog_compiler_wl, $1)='-Qoption ld '
   ;;



Note the difference in the lt_prog_compiler_wl line.

I ran ./configure anyway, but I don't think it did anything
It didn't, the change to libtool.m4 only affects the build system when 
you do an autogen.sh.


--td



Then, you need to run ./autogen.sh.  Then, redo your configure but
you do not need to do anything
with LDFLAGS.  Just use your original flags.  I think this should
work, but I am only reading
what is in the ticket.

Rolf



On 11/29/10 16:26, Nehemiah Dacres wrote:

that looks about right. So the suggestion:

./configure LDFLAGS="-notpath ... ... ..."

-notpath should be replaced by whatever the proper flag should be, in my case 
-L  ?

   


On Mon, Nov 29, 2010 at 3:16 PM, Rolf vandeVaart
mailto:rolf.vandeva...@oracle.com>>
wrote:

This problem looks a lot like a thread from earlier today. 
Can you look at this

ticket and see if it helps?  It has a workaround documented
in it.

https://svn.open-mpi.org/trac/ompi/ticket/2632

Rolf


On 11/29/10 16:13, Prentice Bisbal wrote:

No, it looks like ld is being called with the option -path, and your
linker doesn't use that switch. Grep you Makefile(s) for the string
"-path". It's probably in a statement defining LDFLAGS somewhere.

When you find it, replace it with the equivalent switch for your
compiler. You may be able to override it's value on the configure
command-line, which is usually easiest/best:

./configure LDFLAGS="-notpath ... ... ..."

--
Prentice


Nehemiah Dacres wrote:
   

it may have been that  I didn't set ld_library_path

On Mon, Nov 29, 2010 at 2:36 PM, Nehemiah Dacresmailto:dacre...@slu.edu>
>  wrote:

 thank you, you have been doubly helpful, but I am having linking
 errors and I do not know what the solaris studio compiler's
 preferred linker is. The

 the configure statement was

 ./configure --prefix=/state/partition1/apps/sunmpi/
 --enable-mpi-threads --with-sge --enable-static
 --enable-sparse-groups CC=/opt/oracle/solstudio12.2/bin/suncc
 CXX=/opt/oracle/solstudio12.2/bin/sunCC
 F77=/opt/oracle/solstudio12.2/bin/sunf77
 FC=/opt/oracle/solstudio12.2/bin/sunf90

compile statement was

 make all install 2>errors


 error below is

 f90: Warning: Option -path passed to ld, if ld is invoked, ignored
 otherwise
 f90: Warning: Option -path passed to ld, if ld is invoked, ignored
 otherwise
 f90: Warning: Option -path passed to ld, if ld is invoked, ignored
 otherwise
 f90: Warning: Option -path passed to ld, if ld is invoked, ignored
 otherwise
 f90: Warning: Option -soname passed to ld, if ld is invoked, 
ignored
 otherwise
 /usr/bin/ld: unrecognized opti

Re: [OMPI users] [Rocks-Discuss] compiling Openmpi on solaris studio express

2010-11-30 Thread Terry Dontje
Actually there is a way to modify the configure file that will not 
require the autogen.sh to be ran.
If you go into configure and search for "Sun F" a few lines down will be 
one of three assignments:

lt_prog_compiler_wl
lt_prog_compiler_wl_F77
lt_prog_compiler_wl_FC

If you change them all to '-Qoption ld' and then do the configure things 
should work.


Good luck,

--td

On 11/30/2010 06:19 AM, Terry Dontje wrote:

On 11/29/2010 05:41 PM, Nehemiah Dacres wrote:

thanks.
FYI: its openmpi-1.4.2 from a tarball like you assume
I changed this line
 *Sun\ F* | *Sun*Fortran*)
  # Sun Fortran 8.3 passes all unrecognized flags to the linker
  _LT_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
  _LT_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
  _LT_TAGVAR(lt_prog_compiler_wl, $1)='-Qoption ld '

 unfortunately my autoconf tool is out of date (2.59 , it says it 
wants 2.60+ )


The build page (http://www.open-mpi.org/svn/building.php) show's the 
versions of the tools you need to build OMPI.  Sorry, unfortunately in 
order for this workaround to work you need to re-autogen.sh no way 
around that.


On Mon, Nov 29, 2010 at 4:11 PM, Rolf vandeVaart 
mailto:rolf.vandeva...@oracle.com>> wrote:


No, I do not believe so.  First, I assume you are trying to build
either 1.4 or 1.5, not the trunk.
Secondly, I assume you are building from a tarfile that you have
downloaded.  Assuming these
two things are true, then (as stated in the bug report), prior to
running configure, you want to
make the following edits to config/libtool.m4 in all the places
you see it. ( I think just one place)

FROM:

 *Sun\ F*)
   # Sun Fortran 8.3 passes all unrecognized flags to the linker
   _LT_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
   _LT_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
   _LT_TAGVAR(lt_prog_compiler_wl, $1)=''
   ;;

TO:

 *Sun\ F*)
   # Sun Fortran 8.3 passes all unrecognized flags to the linker
   _LT_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
   _LT_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
   _LT_TAGVAR(lt_prog_compiler_wl, $1)='-Qoption ld '
   ;;



Note the difference in the lt_prog_compiler_wl line.

I ran ./configure anyway, but I don't think it did anything
It didn't, the change to libtool.m4 only affects the build system when 
you do an autogen.sh.


--td



Then, you need to run ./autogen.sh.  Then, redo your configure
but you do not need to do anything
with LDFLAGS.  Just use your original flags.  I think this should
work, but I am only reading
what is in the ticket.

Rolf



On 11/29/10 16:26, Nehemiah Dacres wrote:

that looks about right. So the suggestion:

./configure LDFLAGS="-notpath ... ... ..."

-notpath should be replaced by whatever the proper flag should be, in my case 
-L  ?

   


On Mon, Nov 29, 2010 at 3:16 PM, Rolf vandeVaart
mailto:rolf.vandeva...@oracle.com>>
wrote:

This problem looks a lot like a thread from earlier today. 
Can you look at this

ticket and see if it helps?  It has a workaround documented
in it.

https://svn.open-mpi.org/trac/ompi/ticket/2632

Rolf


On 11/29/10 16:13, Prentice Bisbal wrote:

No, it looks like ld is being called with the option -path, and your
linker doesn't use that switch. Grep you Makefile(s) for the string
"-path". It's probably in a statement defining LDFLAGS somewhere.

When you find it, replace it with the equivalent switch for your
compiler. You may be able to override it's value on the configure
command-line, which is usually easiest/best:

./configure LDFLAGS="-notpath ... ... ..."

--
Prentice


Nehemiah Dacres wrote:
   

it may have been that  I didn't set ld_library_path

On Mon, Nov 29, 2010 at 2:36 PM, Nehemiah Dacresmailto:dacre...@slu.edu>
<mailto:dacre...@slu.edu>>  wrote:

 thank you, you have been doubly helpful, but I am having linking
 errors and I do not know what the solaris studio compiler's
 preferred linker is. The

 the configure statement was

 ./configure --prefix=/state/partition1/apps/sunmpi/
 --enable-mpi-threads --with-sge --enable-static
 --enable-sparse-groups CC=/opt/oracle/solstudio12.2/bin/suncc
 CXX=/opt/oracle/solstudio12.2/bin/sunCC
 F77=/opt/oracle/solstudio12.2/bin/sunf77
 FC=/opt/oracle/solstudio12.2/bin/sunf90

compile statement was

  

Re: [OMPI users] [Rocks-Discuss] compiling Openmpi on solaris studio express

2010-11-30 Thread Terry Dontje
A slight note for the below there should be a space between "ld" and the 
ending single quote mark so it should be '-Qoption ld ' not '-Qoption ld'


--td
On 11/30/2010 06:31 AM, Terry Dontje wrote:
Actually there is a way to modify the configure file that will not 
require the autogen.sh to be ran.
If you go into configure and search for "Sun F" a few lines down will 
be one of three assignments:

lt_prog_compiler_wl
lt_prog_compiler_wl_F77
lt_prog_compiler_wl_FC

If you change them all to '-Qoption ld' and then do the configure 
things should work.


Good luck,

--td

On 11/30/2010 06:19 AM, Terry Dontje wrote:

On 11/29/2010 05:41 PM, Nehemiah Dacres wrote:

thanks.
FYI: its openmpi-1.4.2 from a tarball like you assume
I changed this line
 *Sun\ F* | *Sun*Fortran*)
  # Sun Fortran 8.3 passes all unrecognized flags to the linker
  _LT_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
  _LT_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
  _LT_TAGVAR(lt_prog_compiler_wl, $1)='-Qoption ld '

 unfortunately my autoconf tool is out of date (2.59 , it says it 
wants 2.60+ )


The build page (http://www.open-mpi.org/svn/building.php) show's the 
versions of the tools you need to build OMPI.  Sorry, unfortunately 
in order for this workaround to work you need to re-autogen.sh no way 
around that.


On Mon, Nov 29, 2010 at 4:11 PM, Rolf vandeVaart 
mailto:rolf.vandeva...@oracle.com>> wrote:


No, I do not believe so.  First, I assume you are trying to
build either 1.4 or 1.5, not the trunk.
Secondly, I assume you are building from a tarfile that you have
downloaded.  Assuming these
two things are true, then (as stated in the bug report), prior
to running configure, you want to
make the following edits to config/libtool.m4 in all the places
you see it. ( I think just one place)

FROM:

 *Sun\ F*)
   # Sun Fortran 8.3 passes all unrecognized flags to the linker
   _LT_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
   _LT_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
   _LT_TAGVAR(lt_prog_compiler_wl, $1)=''
   ;;

TO:

 *Sun\ F*)
   # Sun Fortran 8.3 passes all unrecognized flags to the linker
   _LT_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
   _LT_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
   _LT_TAGVAR(lt_prog_compiler_wl, $1)='-Qoption ld '
   ;;



Note the difference in the lt_prog_compiler_wl line.

I ran ./configure anyway, but I don't think it did anything
It didn't, the change to libtool.m4 only affects the build system 
when you do an autogen.sh.


--td



Then, you need to run ./autogen.sh.  Then, redo your configure
but you do not need to do anything
with LDFLAGS.  Just use your original flags.  I think this
should work, but I am only reading
what is in the ticket.

Rolf



On 11/29/10 16:26, Nehemiah Dacres wrote:

that looks about right. So the suggestion:

./configure LDFLAGS="-notpath ... ... ..."

-notpath should be replaced by whatever the proper flag should be, in my case 
-L  ?

   


On Mon, Nov 29, 2010 at 3:16 PM, Rolf vandeVaart
mailto:rolf.vandeva...@oracle.com>> wrote:

This problem looks a lot like a thread from earlier today. 
Can you look at this

ticket and see if it helps?  It has a workaround documented
in it.

https://svn.open-mpi.org/trac/ompi/ticket/2632

Rolf


On 11/29/10 16:13, Prentice Bisbal wrote:

No, it looks like ld is being called with the option -path, and your
linker doesn't use that switch. Grep you Makefile(s) for the string
"-path". It's probably in a statement defining LDFLAGS somewhere.

When you find it, replace it with the equivalent switch for your
compiler. You may be able to override it's value on the configure
command-line, which is usually easiest/best:

./configure LDFLAGS="-notpath ... ... ..."

--
Prentice


Nehemiah Dacres wrote:
   

it may have been that  I didn't set ld_library_path

On Mon, Nov 29, 2010 at 2:36 PM, Nehemiah Dacresmailto:dacre...@slu.edu>
<mailto:dacre...@slu.edu>>  wrote:

 thank you, you have been doubly helpful, but I am having linking
 errors and I do not know what the solaris studio compiler's
 preferred linker is. The

 the configure statement was

 ./configure --prefix=/state/partition1/apps/sunmpi/
 --enable-mpi-threads --with-sge --enable-static
 --enable-sparse-groups CC=/opt/oracle/solstud

Re: [OMPI users] [Rocks-Discuss] compiling Openmpi on solaris studio express

2010-11-30 Thread Terry Dontje

Ticket 2632 really spells out what the issue is.

On 11/30/2010 10:23 AM, Prentice Bisbal wrote:

Nehemiah Dacres wrote:

that looks about right. So the suggestion:

./configure LDFLAGS="-notpath ... ... ..."

-notpath should be replaced by whatever the proper flag should be, in my case 
-L  ?

Yes, that's exactly what I meant. I should have chosen something better
than "-notpath" to say "put a value there that was not '-path'".
I don't think the above will fix the problem because it has to do with 
what how one passes the --rpath option to the linker.  Prior to Studio 
12.2 the --rpath option was passed through to the linker blindly (with a 
warning).  In Studio 12.2 the compiler recognizes -r as a compiler 
option and now "-path" is blindly passed to the linker which has no idea 
what that means.  So one really needs to preface "--rpath" with either 
"-Wl," or "-Qoption ld ".  I don't believe changing the LDFLAGS will 
actually change problem.


--td

Not sure if my suggestion will help, given the bug report below. If
you're really determined, you can always try editing all the makefiles
after configure. Something like this might work:

find . -name Makefile -exec sed -i.bak s/-path/-L/g \{\} \;

Use that at your own risk. You might change instances of the string
'-path' that are actually correct.

Prentice



On Mon, Nov 29, 2010 at 3:16 PM, Rolf vandeVaart
mailto:rolf.vandeva...@oracle.com>>  wrote:

 This problem looks a lot like a thread from earlier today.  Can you
 look at this
 ticket and see if it helps?  It has a workaround documented in it.

 https://svn.open-mpi.org/trac/ompi/ticket/2632

 Rolf


 On 11/29/10 16:13, Prentice Bisbal wrote:

 No, it looks like ld is being called with the option -path, and your
 linker doesn't use that switch. Grep you Makefile(s) for the string
 "-path". It's probably in a statement defining LDFLAGS somewhere.

 When you find it, replace it with the equivalent switch for your
 compiler. You may be able to override it's value on the configure
 command-line, which is usually easiest/best:

 ./configure LDFLAGS="-notpath ... ... ..."

 --
 Prentice


 Nehemiah Dacres wrote:


 it may have been that  I didn't set ld_library_path

 On Mon, Nov 29, 2010 at 2:36 PM, Nehemiah 
Dacresmailto:dacre...@slu.edu>
 >  wrote:

 thank you, you have been doubly helpful, but I am having linking
 errors and I do not know what the solaris studio compiler's
 preferred linker is. The

 the configure statement was

 ./configure --prefix=/state/partition1/apps/sunmpi/
 --enable-mpi-threads --with-sge --enable-static
 --enable-sparse-groups CC=/opt/oracle/solstudio12.2/bin/suncc
 CXX=/opt/oracle/solstudio12.2/bin/sunCC
 F77=/opt/oracle/solstudio12.2/bin/sunf77
 FC=/opt/oracle/solstudio12.2/bin/sunf90

compile statement was

 make all install 2>errors


 error below is

 f90: Warning: Option -path passed to ld, if ld is invoked, ignored
 otherwise
 f90: Warning: Option -path passed to ld, if ld is invoked, ignored
 otherwise
 f90: Warning: Option -path passed to ld, if ld is invoked, ignored
 otherwise
 f90: Warning: Option -path passed to ld, if ld is invoked, ignored
 otherwise
 f90: Warning: Option -soname passed to ld, if ld is invoked, ignored
 otherwise
 /usr/bin/ld: unrecognized option '-path'
 /usr/bin/ld: use the --help option for usage information
 make[4]: *** [libmpi_f90.la  
] Error 2
 make[3]: *** [all-recursive] Error 1
 make[2]: *** [all] Error 2
 make[1]: *** [all-recursive] Error 1
 make: *** [all-recursive] Error 1

 am I doing this wrong? are any of those configure flags unnecessary
 or inappropriate



 On Mon, Nov 29, 2010 at 2:06 PM, Gus 
Correamailto:g...@ldeo.columbia.edu>
 >  wrote:

 Nehemiah Dacres wrote:

 I want to compile openmpi to work with the solaris studio
 express  or
 solaris studio. This is a different version than is installed 
on
 rockscluster 5.2  and would like to know if there any
 gotchas or configure
 flags I should use to get it working or portable to nodes on
 the cluster.
 Software-wise,  it is a fairly homogeneous environment with
 only slight
 variations on the hardware side which could be isolated
 (machinefile flag
 and what-not)
 Please advise


 Hi Nehemiah
 I just answered your email to the OpenMPI list.
 I want to add that if you bui

Re: [OMPI users] Segmentation fault in mca_pml_ob1.so

2010-12-07 Thread Terry Dontje
I am not sure this has anything to do with your problem but if you look 
at the stack entry for PMPI_Recv I noticed the buf has a value of 0.  
Shouldn't that be an address?


Does your code fail if the MPI library is built with -g?  If it does 
fail the same way, the next step I would do would be to walk up the 
stack and try and figure out where the sendreq address is coming from 
because supposedly it is that address that is not mapped according to 
the original stack.


--td

On 12/07/2010 08:29 AM, Grzegorz Maj wrote:

Some update on this issue. I've attached gdb to the crashing
application and I got:

-
Program received signal SIGSEGV, Segmentation fault.
mca_pml_ob1_send_request_put (sendreq=0x130c480, btl=0xc49850,
hdr=0xd10e60) at pml_ob1_sendreq.c:1231
1231pml_ob1_sendreq.c: No such file or directory.
in pml_ob1_sendreq.c
(gdb) bt
#0  mca_pml_ob1_send_request_put (sendreq=0x130c480, btl=0xc49850,
hdr=0xd10e60) at pml_ob1_sendreq.c:1231
#1  0x7fc55bf31693 in mca_btl_tcp_endpoint_recv_handler (sd=, flags=, user=) at btl_tcp_endpoint.c:718
#2  0x7fc55fff7de4 in event_process_active (base=0xc1daf0,
flags=2) at event.c:651
#3  opal_event_base_loop (base=0xc1daf0, flags=2) at event.c:823
#4  0x7fc55ffe9ff1 in opal_progress () at runtime/opal_progress.c:189
#5  0x7fc55c9d7115 in opal_condition_wait (addr=, count=, datatype=,
src=, tag=,
 comm=, status=0xcc6100) at
../../../../opal/threads/condition.h:99
#6  ompi_request_wait_completion (addr=,
count=, datatype=,
src=, tag=,
 comm=, status=0xcc6100) at
../../../../ompi/request/request.h:375
#7  mca_pml_ob1_recv (addr=, count=, datatype=, src=, tag=, comm=,
 status=0xcc6100) at pml_ob1_irecv.c:104
#8  0x7fc560511260 in PMPI_Recv (buf=0x0, count=12884048,
type=0xd10410, source=-1, tag=0, comm=0xd0daa0, status=0xcc6100) at
precv.c:75
#9  0x0049cc43 in BI_Srecv ()
#10 0x0049c555 in BI_IdringBR ()
#11 0x00495ba1 in ilp64_Cdgebr2d ()
#12 0x0047ffa0 in Cdgebr2d ()
#13 0x7fc5621da8e1 in PB_CInV2 () from
/home/gmaj/lib/intel_mkl/current/lib/em64t/libmkl_scalapack_ilp64.so
#14 0x7fc56220289c in PB_CpgemmAB () from
/home/gmaj/lib/intel_mkl/current/lib/em64t/libmkl_scalapack_ilp64.so
#15 0x7fc5622b28fd in pdgemm_ () from
/home/gmaj/lib/intel_mkl/current/lib/em64t/libmkl_scalapack_ilp64.so
-

So this looks like the line responsible for segmentation fault is:
mca_bml_base_endpoint_t *bml_endpoint = sendreq->req_endpoint;

I repeated it several times: always crashes in the same line.

I have no idea what to do with this. Again, any help would be appreciated.

Thanks,
Grzegorz Maj



2010/12/6 Grzegorz Maj:

Hi,
I'm using mkl scalapack in my project. Recently, I was trying to run
my application on new set of nodes. Unfortunately, when I try to
execute more than about 20 processes, I get segmentation fault.

[compn7:03552] *** Process received signal ***
[compn7:03552] Signal: Segmentation fault (11)
[compn7:03552] Signal code: Address not mapped (1)
[compn7:03552] Failing at address: 0x20b2e68
[compn7:03552] [ 0] /lib64/libpthread.so.0(+0xf3c0) [0x7f46e0fc33c0]
[compn7:03552] [ 1]
/home/gmaj/lib/openmpi/lib/openmpi/mca_pml_ob1.so(+0xd577)
[0x7f46dd093577]
[compn7:03552] [ 2]
/home/gmaj/lib/openmpi/lib/openmpi/mca_btl_tcp.so(+0x5b4c)
[0x7f46dc5edb4c]
[compn7:03552] [ 3]
/home/gmaj/lib/openmpi/lib/libopen-pal.so.0(+0x1dbe8) [0x7f46e0679be8]
[compn7:03552] [ 4]
(home/gmaj/lib/openmpi/lib/libopen-pal.so.0(opal_progress+0xa1)
[0x7f46e066dbf1]
[compn7:03552] [ 5]
/home/gmaj/lib/openmpi/lib/openmpi/mca_pml_ob1.so(+0x5945)
[0x7f46dd08b945]
[compn7:03552] [ 6]
/home/gmaj/lib/openmpi/lib/libmpi.so.0(MPI_Send+0x6a) [0x7f46e0b4f10a]
[compn7:03552] [ 7] /home/gmaj/matrix/matrix(BI_Ssend+0x21) [0x49cc11]
[compn7:03552] [ 8] /home/gmaj/matrix/matrix(BI_IdringBR+0x79) [0x49c579]
[compn7:03552] [ 9] /home/gmaj/matrix/matrix(ilp64_Cdgebr2d+0x221) [0x495bb1]
[compn7:03552] [10] /home/gmaj/matrix/matrix(Cdgebr2d+0xd0) [0x47ffb0]
[compn7:03552] [11]
/home/gmaj/lib/intel_mkl/current/lib/em64t/libmkl_scalapack_ilp64.so(PB_CInV2+0x1304)
[0x7f46e27f5124]
[compn7:03552] *** End of error message ***

This error appears during some scalapack computation. My processes do
some mpi communication before this error appears.

I found out, that by modifying btl_tcp_eager_limit and
btl_tcp_max_send_size parameters, I can run more processes - the
smaller those values are, the more processes I can run. Unfortunately,
by this method I've succeeded to run up to 30 processes, which is
still far to small.

Some clue may be what valgrind says:

==3894== Syscall param writev(vector[...]) points to uninitialised byte(s)
==3894==at 0x82D009B: writev (in /lib64/libc-2.12.90.so)
==3894==by 0xBA2136D: mca_btl_tcp_frag_send (in
/home/gmaj/lib/openmpi/lib/openmpi/mca_btl_tcp.so)
==3894==by 0xBA203D0: mca_btl_tcp_endpoint_send (in
/home/gmaj/lib/openmpi/lib/openmpi/mca_btl_tcp.so)
==3894==by 0xB003583: mca_pml_

Re: [OMPI users] Method for worker to determine its "rank" on a single machine?

2010-12-10 Thread Terry Dontje
A more portable way of doing what you want below is to gather each 
processes processor_name given by MPI_Get_processor_name, have the root 
who gets this data assign unique numbers to each name and then scatter 
that info to the processes and have them use that as the color to a 
MPI_Comm_split call.  Once you've done that you can do a MPI_Comm_size 
to find how many are on the node and be able to send to all the other 
processes on that node using the new communicator.


Good luck,

--td
On 12/09/2010 08:18 PM, Ralph Castain wrote:

The answer is yes - sort of...

In OpenMPI, every process has information about not only its own local rank, 
but the local rank of all its peers regardless of what node they are on. We use 
that info internally for a variety of things.

Now the "sort of". That info isn't exposed via an MPI API at this time. If that 
doesn't matter, then I can tell you how to get it - it's pretty trivial to do.


On Dec 9, 2010, at 6:14 PM, David Mathog wrote:


Is it possible through MPI for a worker to determine:

  1. how many MPI processes are running on the local machine
  2. within that set its own "local rank"

?

For instance, a quad core with 4 processes might be hosting ranks 10,
14, 15, 20, in which case the "local ranks" would be 1,2,3,4.  The idea
being to use this information so that a program could selectively access
different local resources.  Simple example: on this 4 worker machine
reside telephone directories for Los Angeles, San Diego, San Jose, and
Sacramento.  Each worker is to open one database and search it when the
master sends a request.  With the "local rank" number this would be as
easy as naming the databases file1, file2, file3, and file4.  Without it
the 4 processes would have to communicate with each other somehow to
sort out which is to use which database.  And that could get ugly fast,
especially if they don't all start at the same time.

Thanks,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] Method for worker to determine its "rank" on a single machine?

2010-12-10 Thread Terry Dontje

On 12/10/2010 09:19 AM, Richard Treumann wrote:


It seems to me the MPI_Get_processor_namedescription is too ambiguous 
to make this 100% portable.  I assume most MPI implementations simply 
use the hostname so all processes on the same host will return the 
same string.  The suggestion would work then.


However, it would also be reasonable for an MPI  that did processor 
binding to return " hostname.socket#.core#" so every rank would have a 
unique processor name.
Fair enough.  However, I think it is a lot more stable then grabbing 
information from the bowels of the runtime environment.  Of course one 
could just call the appropriate system call to get the hostname, if you 
are on the right type of OS/Architecture :-).


The extension idea is a bit at odds with the idea that MPI is an 
architecture independent API.  That does not rule out the option if 
there is a good use case but it does raise the bar just a bit.


Yeah, that is kind of the rub isn't it.  There is enough architectural 
differences out there that it might be difficult to come to an agreement 
on the elements of locality you should focus on.  It would be nice if 
there was some sort of distance value that would be assigned to each 
peer a process has.  Of course then you still have the problem trying to 
figure out what distance you really want to base your grouping on.


--td

Dick Treumann  -  MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363



From:   Ralph Castain 
To: Open MPI Users 
Date:   12/10/2010 08:00 AM
Subject: 	Re: [OMPI users] Method for worker to determine its "rank" 
on asingle machine?

Sent by:users-boun...@open-mpi.org






Ick - I agree that's portable, but truly ugly.

Would it make sense to implement this as an MPI extension, and then 
perhaps propose something to the Forum for this purpose?


Just hate to see such a complex, time-consuming method when the info 
is already available on every process.


On Dec 10, 2010, at 3:36 AM, Terry Dontje wrote:

A more portable way of doing what you want below is to gather each 
processes processor_name given by MPI_Get_processor_name, have the 
root who gets this data assign unique numbers to each name and then 
scatter that info to the processes and have them use that as the color 
to a MPI_Comm_split call.  Once you've done that you can do a 
MPI_Comm_size to find how many are on the node and be able to send to 
all the other processes on that node using the new communicator.


Good luck,

--td
On 12/09/2010 08:18 PM, Ralph Castain wrote:
The answer is yes - sort of...

In OpenMPI, every process has information about not only its own local 
rank, but the local rank of all its peers regardless of what node they 
are on. We use that info internally for a variety of things.


Now the "sort of". That info isn't exposed via an MPI API at this 
time. If that doesn't matter, then I can tell you how to get it - it's 
pretty trivial to do.



On Dec 9, 2010, at 6:14 PM, David Mathog wrote:


Is it possible through MPI for a worker to determine:

1. how many MPI processes are running on the local machine
2. within that set its own "local rank"

?

For instance, a quad core with 4 processes might be hosting ranks 10,
14, 15, 20, in which case the "local ranks" would be 1,2,3,4.  The idea
being to use this information so that a program could selectively access
different local resources.  Simple example: on this 4 worker machine
reside telephone directories for Los Angeles, San Diego, San Jose, and
Sacramento.  Each worker is to open one database and search it when the
master sends a request.  With the "local rank" number this would be as
easy as naming the databases file1, file2, file3, and file4.  Without it
the 4 processes would have to communicate with each other somehow to
sort out which is to use which database.  And that could get ugly fast,
especially if they don't all start at the same time.

Thanks,

David Mathog
_mathog@caltech.edu_ <mailto:mat...@caltech.edu>
Manager, Sequence Analysis Facility, Biology Division, Caltech
___
users mailing list
_users@open-mpi.org_ <mailto:us...@open-mpi.org>
_http://www.open-mpi.org/mailman/listinfo.cgi/users_


___
users mailing list
_users@open-mpi.org_ <mailto:us...@open-mpi.org>
_http://www.open-mpi.org/mailman/listinfo.cgi/users_



--

Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email _terry.dontje@oracle.com_ <mailto:terry.don...@oracle.com>



___
u

Re: [OMPI users] Guaranteed run rank 0 on a given machine?

2010-12-10 Thread Terry Dontje

On 12/10/2010 01:46 PM, David Mathog wrote:

The master is commonly very different from the workers, so I expected
there would be something like

   --rank0-on

but there doesn't seem to be a single switch on mpirun to do that.

If "mastermachine" is the first entry in the hostfile, or the first
machine in a -hosts list, will rank 0 always run there?  If so, will it
always run in the first slot on the first machine listed?  That seems to
be the case in practice, but is it guaranteed?  Even if -loadbalance is
used?


For Open MPI the above is correct, I am hesitant to use guaranteed though.

Otherwise, there is the rankfile method.  In the situation where the
master must run on a specific node, but there is no preference for the
workers, would a rank file like this be sufficient?

rank 0=mastermachine slot=0
I thought you may have had to give all ranks but empirically it looks 
like you can.

The mpirun man page gives an example where all nodes/slots are
specified, but it doesn't say explicitly what happens if the
configuration is only partially specified, or how it interacts with the
-np parameter.  Modifying the man page example:

cat myrankfile
rank 0=aa slot=1:0-2
rank 1=bb slot=0:0,1
rank 2=cc slot=1-2
mpirun -H aa,bb,cc,dd -np 4 -rf myrankfile ./a.out

Rank 0 runs on node aa, bound to socket 1, cores 0-2.
Rank 1 runs on node bb, bound to socket 0, cores 0 and 1.
Rank 2 runs on node cc, bound to cores 1 and 2.

Rank 3 runs where?  not at all, or on dd, aa:slot=0, or ...?
From my empirical runs it looks to me like rank 3 would end up on aa 
possibly slot=0.
In other words once you run out of entries in the rankfile it looks like 
the processes then start from the beginning of the hostlist again.


--td

Also, in my limited testing --host and -hostfile seem to be mutually
exclusive.  That is reasonable, but it isn't clear that it is intended.
  Example, with a hostfile containing one entry for "monkey02.cluster
slots=1":

mpirun  --host monkey01   --mca plm_rsh_agent rsh  hostname
monkey01.cluster
mpirun  --host monkey02   --mca plm_rsh_agent rsh  hostname
monkey02.cluster
mpirun  -hostfile /usr/common/etc/openmpi.machines.test1 \
--mca plm_rsh_agent rsh  hostname
monkey02.cluster
mpirun  --host monkey01  \
   -hostfile /usr/commom/etc/openmpi.machines.test1 \
   --mca plm_rsh_agent rsh  hostname
--
There are no allocated resources for the application
   hostname
that match the requested mapping:


Verify that you have mapped the allocated resources properly using the
--host or --hostfile specification.
--




Thanks,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] Method for worker to determine its "rank" on a single machine?

2010-12-10 Thread Terry Dontje

On 12/10/2010 03:24 PM, David Mathog wrote:

Ashley Pittman wrote:


For a much simpler approach you could also use these two environment

variables, this is on my current system which is 1.5 based, YMMV of course.

OMPI_COMM_WORLD_LOCAL_RANK
OMPI_COMM_WORLD_LOCAL_SIZE
However that doesn't really tell you which MPI_COMM_WORLD ranks are on 
the same node as you I believe.


--td

That is simpler.  It works on OMPI 1.4.3 too:

cat>/usr/common/bin/dumpev.sh





Re: [OMPI users] Newbie question

2011-01-11 Thread Terry Dontje
So are you trying to start an mpi job that one process is one executable 
and the other process(es) are something else?  If so, you probably want 
to use a multiple app context.  If you look at  FAQ question 7. How do I 
run an MPMD MPI Job at http://www.open-mpi.org/faq/?category=running 
this should answer your question below I believe.


--td

On 01/11/2011 01:06 AM, Tena Sakai wrote:

Hi,

Thanks for your reply.

I am afraid your terse response doesn’t shed much light.  What I need 
is “hosts”
parameter I can use to mpi.spawn.Rslaves() function.  Can you explain 
or better

yet give an example as to how I can get this via mpirun?

Looking at mpirun man page, I found an example:
  mpirun –H aa,aa,bb  ./a.out
and similar ones.  But they all execute a program (like a.out above). 
 That’’s not
what I want.  What I want is to spawn a bunch of R slaves to other 
machines on
the network.  I can spawn R slaves, as many as I like, to the local 
machine, but
I don’t know how to do this with machines on the network.  That’s what 
“hosts”
parameter of mpi.spawn.Rslaves() enables me to do, I think.  If I can 
do that, then

Rmpi has function(s) to send command to each of the spawned slaves.

My question is how can I get open MPI to give me those “hosts” parameters.

Can you please help me?

Thank you in advance.

Tena Sakai
tsa...@gallo.ucsf.edu


On 1/10/11 8:14 PM, "pooja varshneya"  wrote:

You can use mpirun.

On Mon, Jan 10, 2011 at 8:04 PM, Tena Sakai
 wrote:

Hi,

I am an mpi newbie.  My open MPI is v 1.4.3, which I compiled
on a linux machine.

I am using a language called R, which has an mpi
interface/package.
It appears that it is happy, on the surface, with the open MPI
I installed.

There is an R function called mpi.spawn.Rslaves().  An argument to
this function is nslaves.  I can issue, for example,
  mpi.spawn.Rslaves( nslaves=20 )
And it spawns 20 slave processes.  The trouble is that it is
all on the
same node as that of the master.  I want, instead, these 20
(or more)
slaves spawned on other machines on the network.

It so happens the mpi.spawn.Rslaves() has an extra argument called
hosts.  Here’s the definition of hosts from the api document:
“NULL or
LAM node numbers to specify where R slaves to be spawned.”  I have
no idea what LAM node is, but there  is a funciton called
lamhosts().
which returns a bit verbose message:

  It seems that there is no lamd running on the host
compute-0-0.local.

  This indicates that the LAM/MPI runtime environment is not
operating.
  The LAM/MPI runtime environment is necessary for the
"lamnodes" command.

  Please run the "lamboot" command the start the LAM/MPI runtime
  environment.  See the LAM/MPI documentation for how to invoke
  "lamboot" across multiple machines.

Here’s my question.  Is there such command as lamboot in open
MPI 1.4.3?
Or am I using a wrong mpi software?  In a FAQ I read that
there are other
MPI software (FT-mpi, LA-mpi, LAM-mpi), but I had notion that
open MPI
is to have functionalities of all.  Is this a wrong impression?

Thank you for your help.

Tena Sakai
tsa...@gallo.ucsf.edu 

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] openmpi's mpi_comm_spawn integrated with sge?

2011-01-25 Thread Terry Dontje

On 01/25/2011 02:17 AM, Will Glover wrote:

Hi all,
I tried a google/mailing list search for this but came up with nothing, so here 
goes:

Is there any level of automation between open mpi's dynamic process management 
and the SGE queue manager?
In particular, can I make a call to mpi_comm_spawn and have SGE dynamically 
increase the number of slots?
This seems a little far fetched, but it would be really useful if this is 
possible.  My application is 'restricted' to coarse-grain task parallelism and 
involves a work load that varies significantly during runtime (between 1 and 
~100 parallel tasks).  Dynamic process management would maintain an optimal 
number of processors and reduce idling.

Many thanks,
This is an interesting idea but no integration has been done that would 
allow an MPI job to request more slots.


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] How closely tied is a specific release of OpenMPI to the host operating system and other system software?

2011-02-02 Thread Terry Dontje

On 02/01/2011 07:34 PM, Jeff Squyres wrote:

On Feb 1, 2011, at 5:02 PM, Jeffrey A Cummings wrote:


I'm getting a lot of push back from the SysAdmin folks claiming that OpenMPI is 
closely intertwined with the specific version of the operating system and/or 
other system software (i.e., Rocks on the clusters).

I wouldn't say that this is true.  We test across a wide variety of OS's and 
compilers.  I'm sure that there are particular platforms/environments that can 
trip up some kind of problem (it's happened before), but in general, Open MPI 
is pretty portable.


To state my question another way:  Apparently each release of Linux and/or 
Rocks comes with some version of OpenMPI bundled in.  Is it dangerous in some 
way to upgrade to a newer version of OpenMPI?

Not at all.  Others have said it, but I'm one of the developers and I'll 
reinforce their answers: I regularly have about a dozen different installations 
of Open MPI on my cluster at any given time (all in different stages of 
development -- all installed to different prefixes).  I switch between them 
quite easily by changing my PATH and LD_LIBRARY_PATH (both locally and on 
remote nodes).
Not to be a lone descenting opinion here is my experience in doing the 
above.


First if you are always recompiling your application with a specific 
version of OMPI then I would agree with everything Jeff said above.  
That is you can build many versions of OMPI on many linux versions and 
have them run.


But there are definite pitfalls once you start trying to keep one set of 
executables and OMPI binaries across different Linux versions.


1.  You may see executables not be able to use OMPI libraries that 
differ in the first dot number release (eg 1.3 vs 1.4 or 1.5 branches).  
We the community try to avoid these incompatibilities as much as 
possible but it happens on occasion (I think 1.3 to 1.4 is one such 
occasion).


2.  The system libraries on different linux versions are not always the 
same.  At Oracle we build a binary distribution of OMPI that we test out 
on several different versions of Linux.  The key here is building on a 
machine that is essentially the lowest common denominator of all the 
system software that exists on the machines one will be running on.  
This is essentially why Oracle states a bounded set of OS versions a 
distribution runs on.  An example of this is there is a component in 
OMPI that was relying on a version of libbfd that changed significantly 
between Linux version.  Once we got rid of the usage of that library we 
were ok.  There are not "a lot" of these instances but the number is not 
zero.


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] Totalview not showing main program on startup with OpenMPI 1.3.x and 1.4.x

2011-02-09 Thread Terry Dontje
This sounds like something I ran into some time ago that involved the 
compiler omitting frame pointers.  You may want to try to compile your 
code with -fno-omit-frame-pointer.  I am unsure if you may need to do 
the same while building MPI though.


--td

On 02/09/2011 02:49 PM, Dennis McRitchie wrote:

Hi,

I'm encountering a strange problem and can't find it having been discussed on 
this mailing list.

When building and running my parallel program using any recent Intel compiler and OpenMPI 1.2.8, 
TotalView behaves entirely correctly, displaying the "Process mpirun is a parallel job. Do you 
want to stop the job now?" dialog box, and stopping at the start of the program. The code 
displayed is the source code of my program's function main, and the stack trace window shows that 
we are stopped in the poll function many levels "up" from my main function's call to 
MPI_Init. I can then set breakpoints, single step, etc., and the code runs appropriately.

But when building and running using Intel compilers with OpenMPI 1.3.x or 1.4.x, 
TotalView displays the usual dialog box, and stops at the start of the program; but my 
main program's source code is *not* displayed. The stack trace window again shows that we 
are stopped in the poll function several levels "up" from my main function's 
call to MPI_Init; but this time, the code displayed is the assembler code for the poll 
function itself.

If I click on 'main' in the stack trace window, the source code for my 
program's function main is then displayed, and I can now set breakpoints, 
single step, etc. as usual.

So why is the program's source code not displayed when using 1.3.x and 1.4.x, 
but is displayed when using 1.2.8. This change in behavior is fairly confusing 
to our users, and it would be nice to have it work as it used to, if possible.

Thanks,
Dennis

Dennis McRitchie
Computational Science and Engineering Support (CSES)
Academic Services Department
Office of Information Technology
Princeton University


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] Totalview not showing main program on startup with OpenMPI 1.3.x and 1.4.x

2011-02-11 Thread Terry Dontje
Sorry I have to ask this, did you build your lastest OMPI version, not 
just the application, with the -g flag too.


IIRC, when I ran into this issue I was actually able to do stepi's and 
eventually pop up the stack however that is really no way to debug a 
program :-).


Unless OMPI is somehow trashing the stack I don't see what OMPI could be 
doing to cause this type of an issue.  Again when I ran into this issue 
known working programs still worked I just was unable to get a full 
stack.  So it was definitely an interfacing issue between totalview and 
the executable (or the result of how the executable and libraries were 
compiled).   Another thing I noticed was when using Solaris Studio dbx I 
was also able to see the full stack where I could not when using 
totaview.  I am not sure if gdb could also see the full stack or not but 
it might be worth a try to attach gdb to a running program and see if 
you get a full stack.


--td


On 02/09/2011 05:35 PM, Dennis McRitchie wrote:


Thanks Terry.

Unfortunately, -fno-omit-frame-pointer is the default for the Intel 
compiler when --g  is used, which I am using since it is necessary for 
source level debugging. So the compiler kindly tells me that it is 
ignoring your suggested option when I specify it. J


Also, since I can reproduce this problem by simply changing the 
OpenMPI version, without changing the compiler version, it strikes me 
as being more likely to be an OpenMPI-related issue: 1.2.8 works, but 
anything later does not (as described below).


I have tried different versions of TotalView from 8.1 to 8.9, but all 
behave the same.


I was wondering if a change to the openmpi-totalview.tcl script might 
be needed?


Dennis

*From:*users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] 
*On Behalf Of *Terry Dontje

*Sent:* Wednesday, February 09, 2011 5:02 PM
*To:* us...@open-mpi.org
*Subject:* Re: [OMPI users] Totalview not showing main program on 
startup with OpenMPI 1.3.x and 1.4.x


This sounds like something I ran into some time ago that involved the 
compiler omitting frame pointers.  You may want to try to compile your 
code with -fno-omit-frame-pointer.  I am unsure if you may need to do 
the same while building MPI though.


--td

On 02/09/2011 02:49 PM, Dennis McRitchie wrote:

Hi,
  
I'm encountering a strange problem and can't find it having been discussed on this mailing list.
  
When building and running my parallel program using any recent Intel compiler and OpenMPI 1.2.8, TotalView behaves entirely correctly, displaying the "Process mpirun is a parallel job. Do you want to stop the job now?" dialog box, and stopping at the start of the program. The code displayed is the source code of my program's function main, and the stack trace window shows that we are stopped in the poll function many levels "up" from my main function's call to MPI_Init. I can then set breakpoints, single step, etc., and the code runs appropriately.
  
But when building and running using Intel compilers with OpenMPI 1.3.x or 1.4.x, TotalView displays the usual dialog box, and stops at the start of the program; but my main program's source code is *not* displayed. The stack trace window again shows that we are stopped in the poll function several levels "up" from my main function's call to MPI_Init; but this time, the code displayed is the assembler code for the poll function itself.
  
If I click on 'main' in the stack trace window, the source code for my program's function main is then displayed, and I can now set breakpoints, single step, etc. as usual.
  
So why is the program's source code not displayed when using 1.3.x and 1.4.x, but is displayed when using 1.2.8. This change in behavior is fairly confusing to our users, and it would be nice to have it work as it used to, if possible.
  
Thanks,

Dennis
  
Dennis McRitchie

Computational Science and Engineering Support (CSES)
Academic Services Department
Office of Information Technology
Princeton University
  
  
___

users mailing list
us...@open-mpi.org  <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>





Re: [OMPI users] Error in Binding MPI Process to a socket

2011-03-18 Thread Terry Dontje

On 03/17/2011 03:31 PM, vaibhav dutt wrote:

Hi,

Thanks for your reply. I tried to execute first a process by using

mpirun -machinefile hostfile.txt  --slot-list 0:1   -np 1

but it gives the same as error as mentioned previously.

Then, I created a rankfile with contents"

rank 0=t1.tools.xxx  slot=0:0
rank 1=t1.tools.xxx  slot=1:0.

and the  used command

mpirun -machinefile hostfile.txt --rankfile my_rankfile.txt   -np 2

but ended  up getting same error. Is there any patch that I can 
install in my system to make it

topology aware?


You may want to check that you have numa turned on.

If you look in your /etc/grub.conf file does the kernel line have 
"numa=on" in it.  If not I would suggest making a new boot line and 
appending numa=on at the end.  That way if the new boot line doesn't 
work you'll be able to go back to the old one.  Anyway, my boot line 
that turns on numa looks like the following:


title Red Hat Enterprise Linux AS-up (2.6.9-67.EL)
root (hd0,0)
kernel /vmlinuz-2.6.9-67.EL ro root=LABEL=/ console=tty0 
console=ttyS0,9600 rhgb quiet numa=on


And of course once you've saved the changes you'll need to reboot and 
select the new boot line at the grub menu.


--td


Thanks


On Thu, Mar 17, 2011 at 2:05 PM, Ralph Castain > wrote:


The error is telling you that your OS doesn't support queries
telling us what cores are on which sockets, so we can't perform a
"bind to socket" operation. You can probably still "bind to core",
so if you know what cores are in which sockets, then you could use
the rank_file mapper to assign processes to groups of cores in a
socket.

It's just that we can't do it automatically because the OS won't
give us the required info.

See "mpirun -h" for more info on slot lists.

On Mar 17, 2011, at 11:26 AM, vaibhav dutt wrote:

> Hi,
>
> I am trying to perform an experiment in which I can spawn 2 MPI
processes, one on each socket in a 4 core node
> having 2 dual cores. I used the option  "bind to socket" which
mpirun for that but I am getting an error like:
>
> An attempt was made to bind a process to a specific hardware
topology
> mapping (e.g., binding to a socket) but the operating system
does not
> support such topology-aware actions.  Talk to your local system
> administrator to find out if your system can support topology-aware
> functionality (e.g., Linux Kernels newer than v2.6.18).
>
> Systems that do not support processor topology-aware
functionality cannot
> use "bind to socket" and other related functionality.
>
>
> Can anybody please tell me what is this error about. Is there
any other option than "bind to socket"
> that I can use.
>
> Thanks.
> ___
> users mailing list
> us...@open-mpi.org 
> http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] Error in Binding MPI Process to a socket

2011-03-18 Thread Terry Dontje

On 03/17/2011 03:31 PM, vaibhav dutt wrote:

Hi,

Thanks for your reply. I tried to execute first a process by using

mpirun -machinefile hostfile.txt  --slot-list 0:1   -np 1

but it gives the same as error as mentioned previously.

Then, I created a rankfile with contents"

rank 0=t1.tools.xxx  slot=0:0
rank 1=t1.tools.xxx  slot=1:0.

and the  used command

mpirun -machinefile hostfile.txt --rankfile my_rankfile.txt   -np 2

but ended  up getting same error. Is there any patch that I can 
install in my system to make it

topology aware?


You may want to check that you have numa turned on.

If you look in your /etc/grub.conf file does the kernel line have 
"numa=on" in it.  If not I would suggest making a new boot line and 
appending numa=on at the end.  That way if the new boot line doesn't 
work you'll be able to go back to the old one.  Anyway, my boot line 
that turns on numa looks like the following:


title Red Hat Enterprise Linux AS-up (2.6.9-67.EL)
root (hd0,0)
kernel /vmlinuz-2.6.9-67.EL ro root=LABEL=/ console=tty0 
console=ttyS0,9600 rhgb quiet numa=on


And of course once you've saved the changes you'll need to reboot and 
select the new boot line at the grub menu.


--td

Thanks


On Thu, Mar 17, 2011 at 2:05 PM, Ralph Castain > wrote:


The error is telling you that your OS doesn't support queries
telling us what cores are on which sockets, so we can't perform a
"bind to socket" operation. You can probably still "bind to core",
so if you know what cores are in which sockets, then you could use
the rank_file mapper to assign processes to groups of cores in a
socket.

It's just that we can't do it automatically because the OS won't
give us the required info.

See "mpirun -h" for more info on slot lists.

On Mar 17, 2011, at 11:26 AM, vaibhav dutt wrote:

> Hi,
>
> I am trying to perform an experiment in which I can spawn 2 MPI
processes, one on each socket in a 4 core node
> having 2 dual cores. I used the option  "bind to socket" which
mpirun for that but I am getting an error like:
>
> An attempt was made to bind a process to a specific hardware
topology
> mapping (e.g., binding to a socket) but the operating system
does not
> support such topology-aware actions.  Talk to your local system
> administrator to find out if your system can support topology-aware
> functionality (e.g., Linux Kernels newer than v2.6.18).
>
> Systems that do not support processor topology-aware
functionality cannot
> use "bind to socket" and other related functionality.
>
>
> Can anybody please tell me what is this error about. Is there
any other option than "bind to socket"
> that I can use.
>
> Thanks.
> ___
> users mailing list
> us...@open-mpi.org 
> http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] 1.5.3 and SGE integration?

2011-03-21 Thread Terry Dontje

Dave what version of Grid Engine are you using?
The plm checks for the following env-var's to determine if you are 
running Grid Engine.

SGE_ROOT
ARC
PE_HOSTFILE
JOB_ID

If these are not there during the session that mpirun is executed then 
it will resort to ssh.


--td


On 03/21/2011 08:24 AM, Dave Love wrote:

I've just tried 1.5.3 under SGE with tight integration, which seems to
be broken.  I built and ran in the same way as for 1.4.{1,3}, which
works, and ompi_info reports the same gridengine parameters for 1.5 as
for 1.4.

The symptoms are that it reports a failure to communicate using ssh,
whereas it should be using the SGE builtin method via qrsh.

There doesn't seem to be a relevant bug report, but before I
investigate, has anyone else succeeded/failed with it, or have any
hints?

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] mpi problems,

2011-04-04 Thread Terry Dontje
libfui.so is a library a part of the Solaris Studio FORTRAN tools.  It 
should be located under lib from where your Solaris Studio compilers are 
installed from.  So one question is whether you actually have Studio 
Fortran installed on all your nodes or not?


--td

On 04/04/2011 04:02 PM, Ralph Castain wrote:
Well, where is libfui located? Is that location in your ld path? Is 
the lib present on all nodes in your hostfile?



On Apr 4, 2011, at 1:58 PM, Nehemiah Dacres wrote:


[jian@therock ~]$ echo $LD_LIBRARY_PATH
/opt/sun/sunstudio12.1/lib:/opt/vtk/lib:/opt/gridengine/lib/lx26-amd64:/opt/gridengine/lib/lx26-amd64:/home/jian/.crlibs:/home/jian/.crlibs32
[jian@therock ~]$ /opt/SUNWhpc/HPC8.2.1c/sun/bin/mpirun  -np 4 
-hostfile list ring2
ring2: error while loading shared libraries: libfui.so.1: cannot open 
shared object file: No such file or directory
ring2: error while loading shared libraries: libfui.so.1: cannot open 
shared object file: No such file or directory
ring2: error while loading shared libraries: libfui.so.1: cannot open 
shared object file: No such file or directory

mpirun: killing job...

--
mpirun noticed that process rank 1 with PID 31763 on node compute-0-1 
exited on signal 0 (Unknown signal 0).

--
mpirun: clean termination accomplished

I really don't know what's wrong here. I was sure that would work

On Mon, Apr 4, 2011 at 2:43 PM, Samuel K. Gutierrez > wrote:


Hi,

Try prepending the path to your compiler libraries.

Example (bash-like):

export
LD_LIBRARY_PATH=/compiler/prefix/lib:/ompi/prefix/lib:$LD_LIBRARY_PATH

--
Samuel K. Gutierrez
Los Alamos National Laboratory


On Apr 4, 2011, at 1:33 PM, Nehemiah Dacres wrote:


altering LD_LIBRARY_PATH alter's the process's path to mpi's
libraries, how do i alter its path to compiler libs like
libfui.so.1? it needs to find them cause it was compiled by a
sun compiler

On Mon, Apr 4, 2011 at 10:06 AM, Nehemiah Dacres
mailto:dacre...@slu.edu>> wrote:


As Ralph indicated, he'll add the hostname to the error
message (but that might be tricky; that error message is
coming from rsh/ssh...).

In the meantime, you might try (csh style):

foreach host (`cat list`)
   echo $host
   ls -l /opt/SUNWhpc/HPC8.2.1c/sun/bin/orted
end


that's what the tentakel line was refering to, or ...


On Apr 4, 2011, at 10:24 AM, Nehemiah Dacres wrote:

> I have installed it via a symlink on all of the nodes,
I can go 'tentakel which mpirun ' and it finds it' I'll
check the library paths but isn't there a way to find
out which nodes are returning the error?

I found it misslinked on a couple nodes. thank you

-- 
Nehemiah I. Dacres

System Administrator
Advanced Technology Group Saint Louis University




-- 
Nehemiah I. Dacres

System Administrator
Advanced Technology Group Saint Louis University

___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Nehemiah I. Dacres
System Administrator
Advanced Technology Group Saint Louis University

___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] orte-odls-default:execv-error

2011-04-05 Thread Terry Dontje

On 04/05/2011 05:11 AM, SLIM H.A. wrote:

After an upgrade of our system I receive the following error message
(openmpi 1.4.2 with gridengine):


quote


--
Sorry!  You were supposed to get help about:
 orte-odls-default:execv-error
But I couldn't open the help file:
 ...path/1.4.2/share/openmpi/help-odls-default.txt: Cannot send after
transport endpoint shut
down.  Sorry!

end quote

and this is this is the section in the text file
...path/1.4.2/share/openmpi/help-odls-default.txt that refers to
orte-odls-default:execv-error





quote

[orte-odls-default:execv-error]
Could not execute the executable "%s": %s

This could mean that your PATH or executable name is wrong, or that you
do not
have the necessary permissions.  Please ensure that the executable is
able to be
found and executed."

end quote

Does the execv-error mean that the file
...path/1.4.2/share/openmpi/help-odls-default.txt was not accessible or
is there a different reason?

No, it thinks it cannot find some executable that was requested to run.  
Do you have the exact mpirun command line that was trying to be ran?  
Can you first try and run without gridengine?

The error message continues with


quote


--
[cn004:00591] mca: base: component_find: unable to open
...path/1.4.2/lib/openmpi/mca_grpcomm_basic: file not found (ignored)
[cn004:00586] mca: base: component_find: unable to open
...path/1.4.2/lib/openmpi/mca_notifier_syslog: file not found (ignored)
[cn004:00585] mca: base: component_find: unable to open
...path/1.4.2/lib/openmpi/mca_notifier_syslog: file not found (ignored)

--
Sorry!  You were supposed to get help about:
 find-available:none-found
But I couldn't open the help file:
 ...path/1.4.2/share/openmpi/help-mca-base.txt: Cannot send after
transport endpoint shutdown
.  Sorry!

--
[cn004:00586] PML ob1 cannot be selected

end quote

but there are .so and .la libraries in the directory
...path/1.4.2/lib/openmpi
Are those the ones not found?
I've seen this when either OPAL_PREFIX or LD_LIBRARY_PATH not being set 
up correctly.

Thanks

Henk

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] orte-odls-default:execv-error

2011-04-05 Thread Terry Dontje

On 04/05/2011 07:37 AM, SLIM H.A. wrote:


Hi Terry

I think the problem may have been caused now by our lustre file system 
being sick, so I'll wait until that is fixed.


It worked outside gridengine but I think I did not include --mca btl 
self,sm,ib or the corresponding environment variables with gridengine, 
although it usually finds the fastest interconnect.


>I've seen this when either OPAL_PREFIX or LD_LIBRARY_PATH not being 
set up correctly.


LD_LIBRARY_PATH is set correctly but where is OPAL_PREFIX set?

OPAL_PREFIX should be set to the base directory of where OMPI is 
installed.  In theory it should not need to be set if configure's prefix 
option is the same place you installed OMPI.  I think it is only when 
you've moved the OMPI installation bits somewhere that doesn't 
corresponds to the configure prefix option.


Of course the same is similarly true with LD_LIBRARY_PATH that you 
really shouldn't need to set that in your scripts/shell if you've 
compiled the programs such that the Rpath is correctly passed to the linker.


--td


Thanks

Henk

*From:*users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] 
*On Behalf Of *Terry Dontje

*Sent:* 05 April 2011 11:21
*To:* us...@open-mpi.org
*Subject:* Re: [OMPI users] orte-odls-default:execv-error

On 04/05/2011 05:11 AM, SLIM H.A. wrote:

After an upgrade of our system I receive the following error message
(openmpi 1.4.2 with gridengine):
  


quote


--
Sorry!  You were supposed to get help about:
 orte-odls-default:execv-error
But I couldn't open the help file:
 ...path/1.4.2/share/openmpi/help-odls-default.txt: Cannot send after
transport endpoint shut
down.  Sorry!

end quote

  
and this is this is the section in the text file

...path/1.4.2/share/openmpi/help-odls-default.txt that refers to
orte-odls-default:execv-error
  
  




quote

[orte-odls-default:execv-error]
Could not execute the executable "%s": %s
  
This could mean that your PATH or executable name is wrong, or that you

do not
have the necessary permissions.  Please ensure that the executable is
able to be
found and executed."

end quote

  
Does the execv-error mean that the file

...path/1.4.2/share/openmpi/help-odls-default.txt was not accessible or
is there a different reason?
  

No, it thinks it cannot find some executable that was requested to 
run.  Do you have the exact mpirun command line that was trying to be 
ran?  Can you first try and run without gridengine?


The error message continues with
  


quote


--
[cn004:00591] mca: base: component_find: unable to open
...path/1.4.2/lib/openmpi/mca_grpcomm_basic: file not found (ignored)
[cn004:00586] mca: base: component_find: unable to open
...path/1.4.2/lib/openmpi/mca_notifier_syslog: file not found (ignored)
[cn004:00585] mca: base: component_find: unable to open
...path/1.4.2/lib/openmpi/mca_notifier_syslog: file not found (ignored)

--
Sorry!  You were supposed to get help about:
 find-available:none-found
But I couldn't open the help file:
 ...path/1.4.2/share/openmpi/help-mca-base.txt: Cannot send after
transport endpoint shutdown
.  Sorry!

--
[cn004:00586] PML ob1 cannot be selected

end quote

  
but there are .so and .la libraries in the directory

...path/1.4.2/lib/openmpi
Are those the ones not found?

I've seen this when either OPAL_PREFIX or LD_LIBRARY_PATH not being 
set up correctly.


  
Thanks
  
Henk
  
___

users mailing list
us...@open-mpi.org  <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>





Re: [OMPI users] alltoall messages > 2^26

2011-04-05 Thread Terry Dontje
It was asked during the community concall whether the below may be 
related to ticket #2722 https://svn.open-mpi.org/trac/ompi/ticket/2722?


--td

On 04/04/2011 10:17 PM, David Zhang wrote:
Any error messages?  Maybe the nodes ran out of memory?  I know MPI 
implement some kind of buffering under the hood, so even though you're 
sending array's over 2^26 in size, it may require more than that for 
MPI to actually send it.


On Mon, Apr 4, 2011 at 2:16 PM, Michael Di Domenico 
mailto:mdidomeni...@gmail.com>> wrote:


Has anyone seen an issue where OpenMPI/Infiniband hangs when sending
messages over 2^26 in size?

For a reason i have not determined just yet machines on my cluster
(OpenMPI v1.5 and Qlogic Stack/QDR IB Adapters) is failing to send
array's over 2^26 in size via the AllToAll collective. (user code)

Further testing seems to indicate that an MPI message over 2^26 fails
(tested with IMB-MPI)

Running the same test on a different older IB connected cluster seems
to work, which would seem to indicate a problem with the infiniband
drivers of some sort rather then openmpi (but i'm not sure).

Any thoughts, directions, or tests?
___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
David Zhang
University of California, San Diego


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] Not pointing to correct libraries

2011-04-05 Thread Terry Dontje
I am not sure Fedora comes with Open MPI installed on it by default (at 
least my FC13 did not).  You may want to look at trying to install the 
Open MPI from yum or some other package mananger.  Or you can download 
the source tarball from http://www.open-mpi.org/software/ompi/v1.4/, 
build and install it yourself.


--td

On 04/05/2011 11:01 AM, Warnett, Jason wrote:


Hello

I am running on Linux, latest version of mpi built but I've run into a 
few issues with a program which I am trying to run. It is a widely 
used open source application called LIGGGHTS so I know the code works 
and should compile, so I obviously have a setting wrong with MPI. I 
saw a similar problem in a previous post (2007), but couldn't see how 
to resolve it as I am quite new to the terminal environment in Unix 
(always been windows... until now).


So the issue I am getting is the following error...

[Jay@Jay chute_wear]$ mpirun -np 1 lmp_fedora < in.chute_wear
lmp_fedora: error while loading shared libraries: libmpi_cxx.so.0: 
cannot open shared object file: No such file or directory


So I checked where stuff was pointing using the ldd command as in that 
post and found the following:

linux-gate.so.1 =>  (0x00d1)
libmpi_cxx.so.0 => not found
libmpi.so.0 => not found
libopen-rte.so.0 => not found
libopen-pal.so.0 => not found
libdl.so.2 => /lib/libdl.so.2 (0x00cbe000)
libnsl.so.1 => /lib/libnsl.so.1 (0x007e6000)
libutil.so.1 => /lib/libutil.so.1 (0x009fa000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x04a02000)
libm.so.6 => /lib/libm.so.6 (0x008a4000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x0011)
libpthread.so.0 => /lib/libpthread.so.0 (0x0055)
libc.so.6 => /lib/libc.so.6 (0x003b3000)
/lib/ld-linux.so.2 (0x00bfa000)

so it is the open mpi files it isn't linking to. How can i sort this? 
I shouldn't need to edit code of the executable of LIGGGHTS I've 
compiled as I know other people are using the same thing so I guess it 
is to do with the way I installed openMPI. I did a system search and 
couldn't find a file called libmpi* anywhere... so my guess is that 
I've incorrectly installed. I have tried several ways, but could you 
tell me how to fix it/ install correctly? (embaressing if it is to do 
with a correct install...)


Thanks

Jay


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] mpi problems,

2011-04-06 Thread Terry Dontje
Something looks fishy about your numbers.  The first two sets of numbers 
look the same and the last set do look better for the most part.  Your 
mpirun command line looks weird to me with the "-mca 
orte_base_help_aggregate btl,openib,self," did something get chopped off 
with the text copy?  You should have had a "-mca btl openib,self".  Can 
you do a run with "-mca btl tcp,self", it should be slower.


I really wouldn't have expected another compiler over IB to be that 
dramatically lower performing.


--td


On 04/06/2011 12:40 PM, Nehemiah Dacres wrote:
also, I'm not sure if I'm reading the results right. According to the 
last run, did using the sun compilers (update 1 )  result in higher 
performance with sunct?


On Wed, Apr 6, 2011 at 11:38 AM, Nehemiah Dacres > wrote:


some tests I did. I hope this isn't an abuse of the list. please
tell me if it is but thanks to all those who helped me.

this  goes to say that the sun MPI works with programs not
compiled with sun’s compilers.
this first test was run as a base case to see if MPI works., the
sedcond run is to see the speed up using OpenIB provides
jian@therock ~]$ mpirun -machinefile list
/opt/iba/src/mpi_apps/mpi_stress/mpi_stress
Start mpi_stress at Wed Apr  6 10:56:29 2011

  Size (bytes) TxMessages  TxMillionBytes/s  
TxMessages/s
32  1  2.77  
86485.67
64  1  5.76  
90049.42
   128  1 11.00  
85923.85
   256  1 18.78  
73344.43
   512  1 34.47  
67331.98
  1024  1 34.81  
33998.09
  2048  1 17.31   
8454.27
  4096  1 18.34   
4476.61
  8192  1 25.43   
3104.28

 16384  1 15.56
949.50
 32768  1 13.95
425.74

 65536  1  9.88
150.79
131072   8192 11.05
 84.31
262144   4096 13.12
 50.04
524288   2048 16.54
 31.55
   1048576   1024 19.92
 18.99
   2097152512 22.54
 10.75
   4194304256 25.46
  6.07

Iteration 0 : errors = 0, total = 0 (495 secs, Wed Apr  6 11:04:44
2011)
After 1 iteration(s), 8 mins and 15 secs, total errors = 0

here is the infiniband run

[jian@therock ~]$ mpirun -mca orte_base_help_aggregate
btl,openib,self, -machinefile list
/opt/iba/src/mpi_apps/mpi_stress/mpi_stress
Start mpi_stress at Wed Apr  6 11:07:06 2011

  Size (bytes) TxMessages  TxMillionBytes/s  
TxMessages/s

32  1  2.72   84907.69
64  1  5.83   91097.94
   128  1 10.75   83959.63
   256  1 18.53   72384.48
   512  1 34.96   68285.00
  1024  1 11.40   11133.10
  2048  1 20.88   10196.34
  4096  1 10.132472.13
  8192  1 19.322358.25
 16384  1 14.58 890.10
 32768  1 15.85 483.61
 65536  1  9.04 137.95
 1310728192 10.90  83.12
262144   4096 13.57
 51.76
524288  2048 16.82
 32.08
   10485761024 19.10  18.21
   2097152512 22.13  10.55
   4194304256 21.66   5.16

Iteration 0 : errors = 0, total = 0 (511 secs, Wed Apr  6 11:15:37
2011)
After 1 iteration(s), 8 mins and 31 secs, total errors = 0
compiled with the sun compilers i think
[jian@therock ~]$ mpirun -mca orte_base_help_aggregate
btl,openib,self, -machinefile list sunMp

Re: [OMPI users] mpi problems,

2011-04-07 Thread Terry Dontje

On 04/06/2011 03:38 PM, Nehemiah Dacres wrote:
I am also trying to get netlib's hpl to run via sun cluster tools so i 
am trying to compile it and am having trouble. Which is the proper mpi 
library to give?

naturally this isn't going to work

MPdir= /opt/SUNWhpc/HPC8.2.1c/sun/
MPinc= -I$(MPdir)/include
*MPlib= $(MPdir)/lib/libmpi.a*
Is there a reason you are trying to link with a static libmpi.  You 
really want to link with libmpi.so.  It also seems like whatever 
Makefile you are using is not using mpicc, is that true.  The reason 
that is important is mpicc would pick up the right libs you needed.  
Which brings me to Ralph's comment, if you really want to go around the 
mpicc way of compiling use mpicc --showme, copy the compile line shown 
in that commands output and insert your files accordingly.


--td


because that doesn't exist
/opt/SUNWhpc-O/HPC8.2.1c/sun/lib/libotf.a  
/opt/SUNWhpc-O/HPC8.2.1c/sun/lib/libvt.fmpi.a  
/opt/SUNWhpc-O/HPC8.2.1c/sun/lib/libvt.omp.a
/opt/SUNWhpc-O/HPC8.2.1c/sun/lib/libvt.a   
/opt/SUNWhpc-O/HPC8.2.1c/sun/lib/libvt.mpi.a   
/opt/SUNWhpc-O/HPC8.2.1c/sun/lib/libvt.ompi.a


is what I have for listing *.a  in the lib directory. none of those 
are equivilant becasue they are all linked with vampire trace if I am 
reading the names right. I've already tried putting  
/opt/SUNWhpc-O/HPC8.2.1c/sun/lib/libvt.mpi.a for this and it didnt 
work giving errors like


On Wed, Apr 6, 2011 at 12:42 PM, Terry Dontje <mailto:terry.don...@oracle.com>> wrote:


Something looks fishy about your numbers.  The first two sets of
numbers look the same and the last set do look better for the most
part.  Your mpirun command line looks weird to me with the "-mca
orte_base_help_aggregate btl,openib,self," did something get
chopped off with the text copy?  You should have had a "-mca btl
openib,self".  Can you do a run with "-mca btl tcp,self", it
should be slower.

I really wouldn't have expected another compiler over IB to be
that dramatically lower performing.

--td



On 04/06/2011 12:40 PM, Nehemiah Dacres wrote:

also, I'm not sure if I'm reading the results right. According to
the last run, did using the sun compilers (update 1 )  result in
higher performance with sunct?

On Wed, Apr 6, 2011 at 11:38 AM, Nehemiah Dacres
mailto:dacre...@slu.edu>> wrote:

some tests I did. I hope this isn't an abuse of the list.
please tell me if it is but thanks to all those who helped me.

this  goes to say that the sun MPI works with programs not
compiled with sun’s compilers.
this first test was run as a base case to see if MPI works.,
the sedcond run is to see the speed up using OpenIB provides
jian@therock ~]$ mpirun -machinefile list
/opt/iba/src/mpi_apps/mpi_stress/mpi_stress
Start mpi_stress at Wed Apr  6 10:56:29 2011

  Size (bytes) TxMessages  TxMillionBytes/s  
TxMessages/s

32  1  2.77
  86485.67
64  1  5.76
  90049.42
   128  1 11.00
  85923.85
   256  1 18.78
  73344.43
   512  1 34.47
  67331.98
  1024  1 34.81
  33998.09
  2048  1 17.31
   8454.27
  4096  1 18.34
   4476.61
  8192  1 25.43
   3104.28
 16384  1 15.56
949.50
 32768  1 13.95
425.74

 65536  1  9.88
150.79
131072   8192 11.05
 84.31
262144   4096 13.12
 50.04
524288   2048 16.54
 31.55
   1048576   1024 19.92
 18.99
   2097152512 22.54
 10.75
   4194304256 25.46
  6.07

Iteration 0 : errors = 0, total = 0 (495 secs, Wed Apr  6
11:04:44 2011)
After 1 iteration(s), 8 mins and 15 secs, total errors = 0

here is the infiniband run

[jian@therock ~]$ mpirun -mca orte_ba

Re: [OMPI users] mpi problems,

2011-04-07 Thread Terry Dontje

Nehemiah,
I took a look at an old version of a hpl Makefile I have.  I think what 
you really want to do is not set the MP* variables to anything and near 
the end of the Makefile set CC and LINKER to mpicc.  You may need to 
also change the CFLAGS and LINKERFLAGS variables to match which 
compiler/arch you are using.


--td
On 04/07/2011 06:20 AM, Terry Dontje wrote:

On 04/06/2011 03:38 PM, Nehemiah Dacres wrote:
I am also trying to get netlib's hpl to run via sun cluster tools so 
i am trying to compile it and am having trouble. Which is the proper 
mpi library to give?

naturally this isn't going to work

MPdir= /opt/SUNWhpc/HPC8.2.1c/sun/
MPinc= -I$(MPdir)/include
*MPlib= $(MPdir)/lib/libmpi.a*
Is there a reason you are trying to link with a static libmpi.  You 
really want to link with libmpi.so.  It also seems like whatever 
Makefile you are using is not using mpicc, is that true.  The reason 
that is important is mpicc would pick up the right libs you needed.  
Which brings me to Ralph's comment, if you really want to go around 
the mpicc way of compiling use mpicc --showme, copy the compile line 
shown in that commands output and insert your files accordingly.


--td


because that doesn't exist
/opt/SUNWhpc-O/HPC8.2.1c/sun/lib/libotf.a  
/opt/SUNWhpc-O/HPC8.2.1c/sun/lib/libvt.fmpi.a  
/opt/SUNWhpc-O/HPC8.2.1c/sun/lib/libvt.omp.a
/opt/SUNWhpc-O/HPC8.2.1c/sun/lib/libvt.a   
/opt/SUNWhpc-O/HPC8.2.1c/sun/lib/libvt.mpi.a   
/opt/SUNWhpc-O/HPC8.2.1c/sun/lib/libvt.ompi.a


is what I have for listing *.a  in the lib directory. none of those 
are equivilant becasue they are all linked with vampire trace if I am 
reading the names right. I've already tried putting  
/opt/SUNWhpc-O/HPC8.2.1c/sun/lib/libvt.mpi.a for this and it didnt 
work giving errors like


On Wed, Apr 6, 2011 at 12:42 PM, Terry Dontje 
mailto:terry.don...@oracle.com>> wrote:


Something looks fishy about your numbers.  The first two sets of
numbers look the same and the last set do look better for the
most part.  Your mpirun command line looks weird to me with the
"-mca orte_base_help_aggregate btl,openib,self," did something
get chopped off with the text copy?  You should have had a "-mca
btl openib,self".  Can you do a run with "-mca btl tcp,self", it
should be slower.

I really wouldn't have expected another compiler over IB to be
that dramatically lower performing.

--td



On 04/06/2011 12:40 PM, Nehemiah Dacres wrote:

also, I'm not sure if I'm reading the results right. According
to the last run, did using the sun compilers (update 1 )  result
in higher performance with sunct?

On Wed, Apr 6, 2011 at 11:38 AM, Nehemiah Dacres
mailto:dacre...@slu.edu>> wrote:

some tests I did. I hope this isn't an abuse of the list.
please tell me if it is but thanks to all those who helped me.

this  goes to say that the sun MPI works with programs not
compiled with sun’s compilers.
this first test was run as a base case to see if MPI works.,
the sedcond run is to see the speed up using OpenIB provides
jian@therock ~]$ mpirun -machinefile list
/opt/iba/src/mpi_apps/mpi_stress/mpi_stress
Start mpi_stress at Wed Apr  6 10:56:29 2011

  Size (bytes) TxMessages  TxMillionBytes/s
  TxMessages/s
32  1  2.77
  86485.67
64  1  5.76
  90049.42
   128  1 11.00
  85923.85
   256  1 18.78
  73344.43
   512  1 34.47
  67331.98
  1024  1 34.81
  33998.09
  2048  1 17.31
   8454.27
  4096  1 18.34
   4476.61
  8192  1 25.43
   3104.28
 16384  1 15.56
949.50
 32768  1 13.95
425.74

 65536  1  9.88
150.79
131072   8192 11.05
 84.31
262144   4096 13.12
 50.04
524288   2048 16.54
 31.55
   1048576   1024 19.92
 18.99
   2097152512

Re: [OMPI users] Configure fail: OpenMPI/1.5.3 with Support for LSF using Sun Studio compilers

2011-04-07 Thread Terry Dontje

On 04/07/2011 06:16 AM, Paul Kapinos wrote:

Dear OpenMPI developers,

We tried to build OpenMPI 1.5.3 including Support for Platform LSF 
using the Sun Studio (=Oracle Solaris Studio now) /12.2 and the 
configure stage failed.


1. Used flags:

./configure --with-lsf --with-openib --with-devel-headers 
--enable-contrib-no-build=vt --enable-mpi-threads CFLAGS="-fast 
-xtarget=nehalem -m64"   CXXFLAGS="-fast -xtarget=nehalem -m64" 
FFLAGS="-fast -xtarget=nehalem" -m64   FCFLAGS="-fast -xtarget=nehalem 
-m64"   F77=f95 LDFLAGS="-fast -xtarget=nehalem -m64" 
--prefix=//openmpi-1.5.3mt/linux64/studio


(note the Support for LSF enabled by --with-lsf). The compiler envvars 
are set as following:

$ echo $CC $FC $CXX
cc f95 CC

The compiler info: (cc -V, CC -V)
cc: Sun C 5.11 Linux_i386 2010/08/13
CC: Sun C++ 5.11 Linux_i386 2010/08/13


2. The configure error was:
##
checking for lsb_launch in -lbat... no
configure: WARNING: LSF support requested (via --with-lsf) but not found.
configure: error: Aborting.
##


3. In the config.log (see the config.log.error) there is more info 
about the problem. crucial info is:

##
/opt/lsf/8.0/linux2.6-glibc2.3-x86_64/lib/libbat.so: undefined 
reference to `ceil'

##

4. Googling vor `ceil' results e.g. in 
http://www.cplusplus.com/reference/clibrary/cmath/ceil/


so, the attached ceil.c example file *can* be compiled by "CC" (the 
Studio C++ compiler), but *cannot* be compiled using "cc" (the Studio 
C compiler).

$ CC ceil.c
$ cc ceil.c

Did you try to link in the math library -lm?  When I did this your test 
program worked for me and that actually is the first test that the 
configure does.


5. Looking into configure.log and searching on `ceil' results: there 
was a check for the availability of `ceil' for the C compiler (see 
config.log.ceil). This check says `ceil' is *available* for the "cc" 
Compiler, which is *wrong*, cf. (4).

See above, it actually is right when you link in the math lib.


So, is there an error in the configure stage? Or either the checks in 
config.log.ceil does not rely on the avilability of the `ceil' funcion 
in the C compiler?

It looks to me like the lbat configure test is not linking in the math lib.

Best wishes,
Paul Kapinos






P.S. Note in in the past we build many older versions of OpenMPI with 
no support for LSF and no such problems







___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] Configure fail: OpenMPI/1.5.3 with Support for LSF using Sun Studio compilers

2011-04-07 Thread Terry Dontje

On 04/07/2011 08:36 AM, Paul Kapinos wrote:

Hi Terry,


so, the attached ceil.c example file *can* be compiled by "CC" (the 
Studio C++ compiler), but *cannot* be compiled using "cc" (the 
Studio C compiler).

$ CC ceil.c
$ cc ceil.c

Did you try to link in the math library -lm?  When I did this your 
test program worked for me and that actually is the first test that 
the configure does.


5. Looking into configure.log and searching on `ceil' results: there 
was a check for the availability of `ceil' for the C compiler (see 
config.log.ceil). This check says `ceil' is *available* for the "cc" 
Compiler, which is *wrong*, cf. (4).

See above, it actually is right when you link in the math lib.


Thankt for the tipp! Yes, if using -lm so the Studio C compiler "cc" 
works also fine for ceil.c:


$ cc ceil.c -lm



So, is there an error in the configure stage? Or either the checks 
in config.log.ceil does not rely on the avilability of the `ceil' 
funcion in the C compiler?
It looks to me like the lbat configure test is not linking in the 
math lib. 


Yes, the is no -lm in configure:84213 line.

Note the cheks for ceil again, config.log.ceil. As far as I unterstood 
these logs, the checks for ceil and for the need of -lm deliver wrong 
results:



configure:55000: checking if we need -lm for ceil

configure:55104: result: no

configure:55115: checking for ceil

configure:55115: result: yes


So, configure assumes "ceil" is available for  the "cc" compiler 
without the need for -lm flag - and this is *wrong*, "cc" need -lm.


Interesting.  I've looked at some of my x86, Studio, linux builds of 
OMPI 1.5 branch and I see the correct configure results for ceil that 
correctly identify the need for -lm.  Your's definitely does not come up 
with the right answer.  Are you using the "official" ompi 1.5.3 tarball?

It seem for me to be an configure issue.


Certainly does.

--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] Configure fail: OpenMPI/1.5.3 with Support for LSF using Sun Studio compilers

2011-04-08 Thread Terry Dontje
Paul and I have been talking about the below issue and I thought it 
would be useful to update the list just in case someone else runs into 
this problem and ends up searching the email list before we actually fix 
the issue.


The problem is OMPI's configure tests to see if -lm is needed to get 
math library functions (eg ceil, sqrt...).  In the case that one is 
using the Solaris Studio compilers (from Oracle) and one passes in the 
-fast option via CFLAGS the -lm test in configure will turn out false 
because -fast set the -xlibmopt flag which provides inline versions of 
some of the math library function.  Because of that OMPI decides it 
doesn't need to set -lm for linking.


The above is problematic when configuring with -with-lsf because the lsf 
library libbat.so has a symbol of ceil that needs to be resolved (so it 
needs -lm in the case of Studio compilers).  Without the -lm the 
configure check for lsf fails.


There are several work arounds:

1.  Put LIBS="-lm" on the configure line.  The compiler still will 
inline the math function compiled in OMPI but linking of the ompi libs 
with lsf libs will work because of the -lm.


2.  In the CFLAGS add -xnolibmopt in addition to -fast.  This will turn 
off the inlining and cause OMPI's configure script to insert -lm.


3.  Don't use -fast.

--td
On 04/07/2011 08:36 AM, Paul Kapinos wrote:

Hi Terry,


so, the attached ceil.c example file *can* be compiled by "CC" (the 
Studio C++ compiler), but *cannot* be compiled using "cc" (the 
Studio C compiler).

$ CC ceil.c
$ cc ceil.c

Did you try to link in the math library -lm?  When I did this your 
test program worked for me and that actually is the first test that 
the configure does.


5. Looking into configure.log and searching on `ceil' results: there 
was a check for the availability of `ceil' for the C compiler (see 
config.log.ceil). This check says `ceil' is *available* for the "cc" 
Compiler, which is *wrong*, cf. (4).

See above, it actually is right when you link in the math lib.


Thankt for the tipp! Yes, if using -lm so the Studio C compiler "cc" 
works also fine for ceil.c:


$ cc ceil.c -lm



So, is there an error in the configure stage? Or either the checks 
in config.log.ceil does not rely on the avilability of the `ceil' 
funcion in the C compiler?
It looks to me like the lbat configure test is not linking in the 
math lib. 


Yes, the is no -lm in configure:84213 line.

Note the cheks for ceil again, config.log.ceil. As far as I unterstood 
these logs, the checks for ceil and for the need of -lm deliver wrong 
results:



configure:55000: checking if we need -lm for ceil

configure:55104: result: no

configure:55115: checking for ceil

configure:55115: result: yes


So, configure assumes "ceil" is available for  the "cc" compiler 
without the need for -lm flag - and this is *wrong*, "cc" need -lm.


It seem for me to be an configure issue.

Greetings

Paul



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] OMPI vs. network socket communcation

2011-05-02 Thread Terry Dontje

On 04/30/2011 08:52 PM, Jack Bryan wrote:

Hi, All:

What is the relationship between MPI communication and socket 
communication ?


MPI may use socket communications to do communications between two 
processes.  Aside from that they are used for different purposes.

Is the network socket programming better than MPI ?
Depends on what you are trying to do.  If you are writing a parallel 
program that may run in multiple environments with different types of 
performing protocols available for its use then MPI is probably better.  
If you are looking to do simple client/server type programming then 
socket program might have an advantage.


I am a newbie of network socket programming.

I do not know which one is better for parallel/distributed computing ?

IMO MPI.


I know that network socket is unix-based file communication between 
server and client.


If they can also be used for parallel computing, how MPI can work 
better than them ?
There is a lot of stuff that MPI does behind the curtain to make a 
parallel applications life a lot easier.  As far as performance MPI will 
not perform better than sockets if it is using sockets as the underlying 
model.  However, the performance difference should be negligible which 
makes all the other stuff MPI does for you a big win.


I know MPI is for homogeneous cluster system and network socket is 
based on internet TCP/IP.
What do you mean by homogeneous cluster?  There are some MPIs that can 
work among different platforms and even different OSes (though some 
initial setup may be necessary).


Hope this helps,


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] OMPI vs. network socket communcation

2011-05-02 Thread Terry Dontje

On 05/02/2011 11:30 AM, Jack Bryan wrote:

Thanks for your reply.

MPI is for academic purpose. How about business applications ?

There are quite a bit of non-academic MPI applications.  For example 
there are quite a bit of simulation codes from different vendors that 
support MPI (Nastran is one common one).
What kinds of parallel/distributed computing environment do the 
financial institutions

use for their high frequency trading ?
I personally know of a private trading shop that uses MPI, but that's as 
much as I can say.  I am not sure how common it is, however the direct 
communications to the trading servers is still via sockets or something 
similar as opposed to MPI.


--td



Any help is really appreciated.

Thanks,


Date: Mon, 2 May 2011 08:34:33 -0400
From: terry.don...@oracle.com
To: us...@open-mpi.org
Subject: Re: [OMPI users] OMPI vs. network socket communcation

On 04/30/2011 08:52 PM, Jack Bryan wrote:

Hi, All:

What is the relationship between MPI communication and socket
communication ?

MPI may use socket communications to do communications between two 
processes.  Aside from that they are used for different purposes.


Is the network socket programming better than MPI ?

Depends on what you are trying to do.  If you are writing a parallel 
program that may run in multiple environments with different types of 
performing protocols available for its use then MPI is probably 
better.  If you are looking to do simple client/server type 
programming then socket program might have an advantage.



I am a newbie of network socket programming.

I do not know which one is better for parallel/distributed
computing ?

IMO MPI.


I know that network socket is unix-based file communication
between server and client.

If they can also be used for parallel computing, how MPI can work
better than them ?

There is a lot of stuff that MPI does behind the curtain to make a 
parallel applications life a lot easier.  As far as performance MPI 
will not perform better than sockets if it is using sockets as the 
underlying model.  However, the performance difference should be 
negligible which makes all the other stuff MPI does for you a big win.



I know MPI is for homogeneous cluster system and network socket is
based on internet TCP/IP.

What do you mean by homogeneous cluster?  There are some MPIs that can 
work among different platforms and even different OSes (though some 
initial setup may be necessary).


Hope this helps,


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 




___ users mailing list 
us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] OpenMPI LS-DYNA Connection refused

2011-05-02 Thread Terry Dontje

On 05/02/2011 01:27 PM, Robert Walters wrote:


Open-MPI Users,

I've been using OpenMPI for a while now and am very pleased with it. I 
use the OpenMPI system across eight Red Hat Linux nodes (8 cores each) 
on 1 Gbps Ethernet behind a dedicated switch. After working out kinks 
in the beginning, we've been using it periodically anywhere from 8 
cores to 64 cores. We use a finite element software named LS-DYNA. We 
do not have source code for this program, it is compiled to work with 
OpenMPI 1.4.1 (I use 1.4.2) and we cannot make changes or request code 
to see how it performs certain functions.


From time to time, I will be simulating a particular "job" in LS-DYNA 
and for some reason, it will quit OpenMPI issuing a MPI_ABORT command 
stating that "connect to address xx.xxx.xxx.xxx port xxx: Connection 
refused; trying normal rsh (/usr/bin/rsh)." This error comes after 
running for hours, which means that connections to the node it's 
citing have already been made previously. The particular node it names 
is random and changes from simulation to simulation. We use SSH to 
communicate and we have the ports open for node-to-node communications 
on any port.


I am curious what makes you think the connections to the node its citing 
have been made?  Are you sure the connection between two processes have 
been made?


Does any user have experience with this error where a connection is 
established, and used for several hours, but after a seemingly random 
period of time the program dies stating it can't make a connection?


Have you tried running the code giving mpirun the "-mca 
mpi_preconnect_mpi 1" option?  This will try (it isn't complete but 
close) to establish all connections at the start of the job.


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] OpenMPI LS-DYNA Connection refused

2011-05-02 Thread Terry Dontje

On 05/02/2011 02:04 PM, Robert Walters wrote:


Terry,

I was under the impression that all connections are made because of 
the nature of the program that OpenMPI is invoking. LS-DYNA is a 
finite element solver and for any given simulation I run, the cores on 
each node must constantly communicate with one another to check for 
various occurrences (contact with various pieces/parts, updating nodal 
coordinates, etc...).


You might be right, the connections might have been established but the 
error message you state (connection refused) seems out of place if the 
connection was already established.


Was there more error messages from OMPI other than "connection 
refused"?  If so could you possibly provide that output to us, maybe it 
will give us a hint where in the library things are messing up.


I've run the program using --mca mpi_preconnect_mpi 1 and the 
simulation has started itself up successfully which I think means that 
the mpi_preconnect passed since all of the child processes have 
started up on each individual node. Thanks for the suggestion though, 
it's a good place to start.



Yeah, it possibly could be telling if things do work with this setting.


I've been worried (though I have no basis for it) that messages may be 
getting queued up and hitting some kind of ceiling or timeout. As a 
finite element code, I think the communication occurs on a large 
scale. Lots of very small packets going back and forth quickly. A few 
studies have been done by the High Performance Computing Advisory 
Council 
(http://www.hpcadvisorycouncil.com/pdf/LS-DYNA%20_analysis.pdf) and 
they've suggested that LS-DYNA communicates at very, very high rates 
(Not sure but from pg.15 of that document they're suggesting hundreds 
of millions of messages in only a few hours). Is there any kind of 
buffer or queue that OpenMPI develops if messages are created too 
quickly? Does it dispatch them immediately or does it attempt to apply 
some kind of traffic flow control?


The queuing really depends on what type of calls the application is 
making.  If it is doing blocking sends then I wouldn't expect too much 
queuing happening using the tcp btl.  As far as traffic flow control is 
concerned I believe the tcp btl doesn't do any for the most part and 
lets tcp handle that.  Maybe someone else on the list could chime in if 
I am wrong here.


In the past I have seen where lots of traffic on the network and to a 
particular node has cause some connections not to be established.  But I 
don't know of any outstanding issue with such issues right now.


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] OpenMPI LS-DYNA Connection refused

2011-05-03 Thread Terry Dontje

A little more clarification:

1.  Simulations that fail always seem to fail?
2.  Does the same simulation always fail between the same processes (how 
about nodes)? I thought you

said no previously.
3.  Did the mpi_preconnect_mpi help any?
4.  Is there any informational messages in the /var/log/messages file 
around or before the abort?
5.  Have you tried netstat -s 1 while the program is running on one of 
the nodes that fail and see if

you are getting any of the failure type events spiking?

The error code coming back from MPI_Abort seems really odd.  I am 
curious whether the connection refused is a result of the abort or what?


--td
On 05/02/2011 03:40 PM, Robert Walters wrote:


I've attached the typical error message I've been getting. This is 
from a run I initiated this morning. The first few lines or so are 
related to the LS-DYNA program and are just there to let you know its 
running successfully for an hour and a half.


What's interesting is this doesn't happen on every job I run, and will 
recur for the same simulation. For instance, Simulation A will run for 
40 hours, and complete successfully. Simulation B will run for 6 
hours, and die from an error. Any further attempts to run simulation B 
will always end from an error. This makes me think there is some kind 
of bad calculation happening that OpenMPI doesn't know how to handle, 
or LS-DYNA doesn't know how to pass to OpenMPI. On the other hand, 
this particular simulation is one of those "benchmarks" and everyone 
runs it. I should not be getting errors from the FE code itself. 
Odd... I think I'll try this as an SMP job as well as an MPP job over 
a single node and see if the issue continues. That way I can figure 
out if its OpenMPI related or FE code related, but as I mentioned, I 
don't think it is FE code related since others have successfully run 
this particular benchmarking simulation.


*_Error Message:_*

 Parallel execution with 56 MPP proc

 NLQ used/max   152/   152

 Start time   05/02/2011 10:02:20

 End time 05/02/2011 11:24:46

 Elapsed time4946 seconds(  1 hours 22 min. 26 sec.) for9293 
cycles


 E r r o r   t e r m i n a t i o n

--

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD

with errorcode -1525207032.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.

You may or may not see output from other processes, depending on

exactly when Open MPI kills them.

--

connect to address xx.xxx.xx.xxx port 544: Connection refused

connect to address xx.xxx.xx.xxx port 544: Connection refused

trying normal rsh (/usr/bin/rsh)

--

mpirun has exited due to process rank 0 with PID 24488 on

node allision exiting without calling "finalize". This may

have caused other processes in the application to be

terminated by signals sent by mpirun (as reported here).

--

Regards,

Robert Walters



*From:*users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] 
*On Behalf Of *Terry Dontje

*Sent:* Monday, May 02, 2011 2:50 PM
*To:* us...@open-mpi.org
*Subject:* Re: [OMPI users] OpenMPI LS-DYNA Connection refused

On 05/02/2011 02:04 PM, Robert Walters wrote:

Terry,

I was under the impression that all connections are made because of 
the nature of the program that OpenMPI is invoking. LS-DYNA is a 
finite element solver and for any given simulation I run, the cores on 
each node must constantly communicate with one another to check for 
various occurrences (contact with various pieces/parts, updating nodal 
coordinates, etc...).


You might be right, the connections might have been established but 
the error message you state (connection refused) seems out of place if 
the connection was already established.


Was there more error messages from OMPI other than "connection 
refused"?  If so could you possibly provide that output to us, maybe 
it will give us a hint where in the library things are messing up.


I've run the program using --mca mpi_preconnect_mpi 1 and the 
simulation has started itself up successfully which I think means that 
the mpi_preconnect passed since all of the child processes have 
started up on each individual node. Thanks for the suggestion though, 
it's a good place to start.


Yeah, it possibly could be telling if things do work with this setting.

I've been worried (though I have no basis for it) that messages may be 
getting queued up and hitting some kind of ceiling or timeout. As a 
finite element code, I think the communication occurs on a large 
scale. Lots of very small packets going ba

Re: [OMPI users] OpenMPI LS-DYNA Connection refused

2011-05-03 Thread Terry Dontje
Looking at your output more the below "Connect to address" doesn't match 
any messages I see in the source code.  Also "trying normal 
/usr/bin/rsh" looks odd to me.


You may want to set the mca parameter mpi_abort_delay and attach a 
debugger to the abortive process and dump out a stack trace.  That 
should give a better idea where the failure is being triggered.  You can 
look at http://www.open-mpi.org/faq/?category=debugging question 4 for 
more info on the parameter.


--td

On 05/02/2011 03:40 PM, Robert Walters wrote:


I've attached the typical error message I've been getting. This is 
from a run I initiated this morning. The first few lines or so are 
related to the LS-DYNA program and are just there to let you know its 
running successfully for an hour and a half.


What's interesting is this doesn't happen on every job I run, and will 
recur for the same simulation. For instance, Simulation A will run for 
40 hours, and complete successfully. Simulation B will run for 6 
hours, and die from an error. Any further attempts to run simulation B 
will always end from an error. This makes me think there is some kind 
of bad calculation happening that OpenMPI doesn't know how to handle, 
or LS-DYNA doesn't know how to pass to OpenMPI. On the other hand, 
this particular simulation is one of those "benchmarks" and everyone 
runs it. I should not be getting errors from the FE code itself. 
Odd... I think I'll try this as an SMP job as well as an MPP job over 
a single node and see if the issue continues. That way I can figure 
out if its OpenMPI related or FE code related, but as I mentioned, I 
don't think it is FE code related since others have successfully run 
this particular benchmarking simulation.


*_Error Message:_*

 Parallel execution with 56 MPP proc

 NLQ used/max   152/   152

 Start time   05/02/2011 10:02:20

 End time 05/02/2011 11:24:46

 Elapsed time4946 seconds(  1 hours 22 min. 26 sec.) for9293 
cycles


 E r r o r   t e r m i n a t i o n

--

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD

with errorcode -1525207032.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.

You may or may not see output from other processes, depending on

exactly when Open MPI kills them.

--

connect to address xx.xxx.xx.xxx port 544: Connection refused

connect to address xx.xxx.xx.xxx port 544: Connection refused

trying normal rsh (/usr/bin/rsh)

--

mpirun has exited due to process rank 0 with PID 24488 on

node allision exiting without calling "finalize". This may

have caused other processes in the application to be

terminated by signals sent by mpirun (as reported here).

--

Regards,

Robert Walters



*From:*users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] 
*On Behalf Of *Terry Dontje

*Sent:* Monday, May 02, 2011 2:50 PM
*To:* us...@open-mpi.org
*Subject:* Re: [OMPI users] OpenMPI LS-DYNA Connection refused

On 05/02/2011 02:04 PM, Robert Walters wrote:

Terry,

I was under the impression that all connections are made because of 
the nature of the program that OpenMPI is invoking. LS-DYNA is a 
finite element solver and for any given simulation I run, the cores on 
each node must constantly communicate with one another to check for 
various occurrences (contact with various pieces/parts, updating nodal 
coordinates, etc...).


You might be right, the connections might have been established but 
the error message you state (connection refused) seems out of place if 
the connection was already established.


Was there more error messages from OMPI other than "connection 
refused"?  If so could you possibly provide that output to us, maybe 
it will give us a hint where in the library things are messing up.


I've run the program using --mca mpi_preconnect_mpi 1 and the 
simulation has started itself up successfully which I think means that 
the mpi_preconnect passed since all of the child processes have 
started up on each individual node. Thanks for the suggestion though, 
it's a good place to start.


Yeah, it possibly could be telling if things do work with this setting.

I've been worried (though I have no basis for it) that messages may be 
getting queued up and hitting some kind of ceiling or timeout. As a 
finite element code, I think the communication occurs on a large 
scale. Lots of very small packets going back and forth quickly. A few 
studies have been done by the High Performance Computing Advisory 
Council 
(http://www.hpcadvisorycouncil.co

Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.

2011-07-15 Thread Terry Dontje

Mudassar,

You can do what you are asking.  The receiver uses MPI_ANY_SOURCE for 
the source rank value and when you receive a message the 
status.MPI_SOURCE will contain the rank of the actual sender not the 
receiver's rank.  If you are not seeing that then there is a bug somewhere.


--td

On 7/14/2011 9:54 PM, Mudassar Majeed wrote:

Friend,
 I can not specify the rank of the sender. Because only 
the sender knows to which receiver the message is to be sent. The 
receiver does not know from which sender the message will come. I am 
trying to do a research work on load balancing in MPI application 
where load is redistributed, so in that I require a receiver to 
receive a load value from a sender that it does not know. On the other 
hand, the sender actually calculates, to which receiver this load 
value should be sent. So for this, I want sender to send a message 
containing the load to a receiver, but receiver does not know from 
which sender the message will come. See, it is like send receiver in 
DATAGRAM sockets. The receiver, receives the message on the IP and 
port, the message which was directed for it. I want to have same 
behavior. But it seems that it is not possible in MPI. Isn't it?


regards,
Mudassar


*From:* Jeff Squyres 
*To:* Mudassar Majeed 
*Cc:* Open MPI Users 
*Sent:* Friday, July 15, 2011 3:30 AM
*Subject:* Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.

Right.  I thought you were asking about receiving *another* message 
from whomever you just received from via ANY_SOURCE.


If you want to receive from a specific sender, you just specify the 
rank you want to receive from -- not ANY_SOURCE.


You will always only receive messages that were sent to *you*.  
There's no MPI_SEND_TO_ANYONE_WHO_IS_LISTENING functionality, for 
example.  So your last statement: "But when it captures with .. 
MPI_ANY_SOURCE and MPI_ANY_TAG, the receiver will capture any message 
(even not targetted for it)" is incorrect.


I guess I still don't understand your question...?


On Jul 14, 2011, at 9:17 PM, Mudassar Majeed wrote:

>
> I know this, but when I compare status.MPI_SOURCE with myid, they 
are different. I guess you need to reconsider my question. The 
MPI_Recv function seems to capture message from the queue with some 
search parameters like source, tag etc. So in case the receiver does 
not know the sender and wants to receive only that message which was 
sent for this receiver. But when it captures with source as 
MPI_ANY_SOURCE and MPI_ANY_TAG, the receiver will capture any message 
(even not targetted for it).

>
> regards,
> Mudassar
>
>
> From: Jeff Squyres mailto:jsquy...@cisco.com>>
> To: Mudassar Majeed >; Open MPI Users >

> Sent: Friday, July 15, 2011 1:58 AM
> Subject: Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.
>
> When you use MPI_ANY_SOURCE in a receive, the rank of the actual 
sender is passed back to you in the status.MPI_SOURCE.

>
> On Jul 14, 2011, at 7:55 PM, Mudassar Majeed wrote:
>
> > Hello people,
> >I am trapped in the following problem plz 
help me. Suppose a process A sends a message to process B. The process 
B will receive the message with MPI_Recv with MPI_ANY_SOURCE in the 
source argument. Let say process B does not know that A is the sender. 
But I want B to receive message from process A (the one who actually 
sends the message to process B). But if I use MPI_ANY_SOURCE, then any 
message from any source is captured by process B (let say there are 
other processes sending messages). Instead of MPI_ANY_SOURCE I cannot 
use A in the source argument as B does not know about the sender. What 
should I do in this situation ?

> >
> > regards,
> > Mudassar Majeed
> > ___
> > users mailing list
> > us...@open-mpi.org 
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com 
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>


--
Jeff Squyres
jsquy...@cisco.com 
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] Open MPI & Grid Engine/Grid Scheduler thread binding

2011-07-15 Thread Terry Dontje
Here's, hopefully, more useful info.  Note reading the job2core.pdf 
presentation, that was  mentioned earlier, more closely will also 
clarify a couple points (I've put those points inline below).


On 7/15/2011 12:01 AM, Ralph Castain wrote:

On Jul 14, 2011, at 5:46 PM, Jeff Squyres wrote:


Looping in the users mailing list so that Ralph and Oracle can comment...

Not entirely sure what I can contribute here, but I'll try - see below for some 
clarifications. I think the discussion here is based on some misunderstanding 
of how OMPI works.



On Jul 14, 2011, at 2:34 PM, Rayson Ho wrote:


(CC'ing Jeff from the Open-MPI project...)

On Thu, Jul 14, 2011 at 1:35 PM, Tad Kollar  wrote:

As I thought more about it, I was afraid that might be the case, but hoped
sge_shepherd would do some magic for tightly-integrated jobs.

To SGE, if each of the tasks is not started by sge_shepherd, then the
only option is to set the binding mask to the allocation, which in
your original case, was the whole system (48 CPUs).



We're running OpenMPI 1.5.3 if that makes a difference. Do you know of
anyone using an MVAPICH2 1.6 pe that can handle binding?

OMPI uses its own binding scheme - we stick within the overall binding envelope 
given to us, but we don't use external bindings of individual procs. Reason is 
simple: SGE has no visibility into the MPI procs we spawn. All SGE sees is 
mpirun and the daemons (called orteds) we launch on each node, and so it can't 
provide a binding scheme for the MPI procs (it actually has no idea how many 
procs are on each node as OMPI's mapper can support a multitude of algorithms, 
all invisible to SGE).


However, if one reads the job2core.pdf presentation on page 14 it talks 
about how SGE will pass a rankfile to Open MPI which is how SGE drives 
the binding it wants for an Open MPI job.

I just downloaded Open MPI 1.5.4a and grep'ed the source, looks like
it is not looking at the SGE_BINDING env variable that is set by SGE.

No, we don't. However, the orteds do check to see if they have been bound, and 
if so, to what processors. Those bindings are then used as an envelope limiting 
the processors we use to bind the procs we spawn.

I believe SGE_BINDING is an env-var to SGE that tells it what binding to 
use for the job and SGE will then, as mentioned above, generate a 
rankfile to be used by Open MPI.



The serial case worked (its affinity list was '0' instead of '0-47'), so at
least we know that's in good shape :-)

Please also submit a few more jobs and see if the new hwloc code is
able to handle multiple jobs running on your AMD MC server.



My ultimate goal is for affinity support to be enabled and scheduled
automatically for all MPI users, i.e. without them having to do any more
than they would for a no-affinity job (otherwise I have a feeling most of
them would just ignore it). What do you think it will take to get to that
point?

We tried to do this once - I set a default param to auto-bind processes. Major 
error. I was lynched by the MPI user community until we removed that param.

Reason is simple: suppose you have MPI processes that launch threads. Remember, 
there is no thread-level binding out there - all the OS will let you do is bind 
at the process level. So now you bind someone's MPI process to some core(s), 
which forces all the threads from that process to stay within that 
bindingthereby potentially creating a horrendous thread-contention problem.

It doesn't take threading to cause problems - some applications just don't work 
as well when bound. It's true that the benchmarks generally do, but they aren't 
representative of real applications.

Bottom line: defaulting to binding processes was something the MPI community 
appears to have rejected, with reason. Might argue about whether or not they 
are correct, but that appears to be the consensus, and it is the position OMPI 
has adopted. User ignorance of when to bind and when not to bind is not a valid 
reason to impact everyone.



That's my goal since 2008...

I started a mail thread, "processor affinity -- OpenMPI / batchsystem
integration" to the Open MPI list in 2008. And in 2009, the conclusion
was that Sun was saying that the binding info is set in the
environment and Open MPI would perform the binding itself (so I
assumed that was done):

It is done - we just use OMPI's binding schemes and not the ones provided 
natively by SGE. Like I said above, SGE doesn't see the MPI procs and can't 
provide a binding pattern for them - so looking at the SUNW_MP_BIND envar is 
pointless.

Note SUNW_MP_BIND has *nothing* to do with  Open MPI but is a way that 
SGE feeds binding to OpenMP (note no "I") applications.  So Ralph is 
right that this env-var is pointless from an Open MPI perspective.



http://www.open-mpi.org/community/lists/users/2009/10/10938.php

Revisiting the presentation (see: job2core.pdf link at the above URL),
Sun's variable name is $SUNW_MP_BIND, so it is most likely Sun Cluster
Toolki

Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.

2011-07-15 Thread Terry Dontje
Well MPI_Recv does give you the message that was sent specifically to 
the rank calling it by any of the processes in the communicator.  If you 
think the message you received should have gone to another rank then 
there is a bug somewhere.  I would start by either adding debugging 
printf's to your code to trace the messages.   Or narrowing down the 
code to a small kernel such that you can prove to yourself that MPI is 
working the way it should and if not you can show us where it is going 
wrong.


--td

On 7/15/2011 6:51 AM, Mudassar Majeed wrote:
I get the sender's rank in status.MPI_SOURCE, but it is different than 
expected. I need to receive that message which was sent to me, not any 
message.


regards,

Date: Fri, 15 Jul 2011 06:33:41 -0400
From: Terry Dontje <mailto:terry.don...@oracle.com>>

Subject: Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.
To: us...@open-mpi.org <mailto:us...@open-mpi.org>
Message-ID: <4e201785.6010...@oracle.com 
<mailto:4e201785.6010...@oracle.com>>

Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"

Mudassar,

You can do what you are asking.  The receiver uses MPI_ANY_SOURCE for
the source rank value and when you receive a message the
status.MPI_SOURCE will contain the rank of the actual sender not the
receiver's rank.  If you are not seeing that then there is a bug 
somewhere.


--td

On 7/14/2011 9:54 PM, Mudassar Majeed wrote:
> Friend,
>  I can not specify the rank of the sender. Because only
> the sender knows to which receiver the message is to be sent. The
> receiver does not know from which sender the message will come. I am
> trying to do a research work on load balancing in MPI application
> where load is redistributed, so in that I require a receiver to
> receive a load value from a sender that it does not know. On the other
> hand, the sender actually calculates, to which receiver this load
> value should be sent. So for this, I want sender to send a message
> containing the load to a receiver, but receiver does not know from
> which sender the message will come. See, it is like send receiver in
> DATAGRAM sockets. The receiver, receives the message on the IP and
> port, the message which was directed for it. I want to have same
> behavior. But it seems that it is not possible in MPI. Isn't it?
>
> regards,
> Mudassar
>
> 
> *From:* Jeff Squyres mailto:jsquy...@cisco.com>>
> *To:* Mudassar Majeed <mailto:mudassar...@yahoo.com>>

> *Cc:* Open MPI Users mailto:us...@open-mpi.org>>
> *Sent:* Friday, July 15, 2011 3:30 AM
> *Subject:* Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.
>
> Right.  I thought you were asking about receiving *another* message
> from whomever you just received from via ANY_SOURCE.
>
> If you want to receive from a specific sender, you just specify the
> rank you want to receive from -- not ANY_SOURCE.
>
> You will always only receive messages that were sent to *you*.
> There's no MPI_SEND_TO_ANYONE_WHO_IS_LISTENING functionality, for
> example.  So your last statement: "But when it captures with ..
> MPI_ANY_SOURCE and MPI_ANY_TAG, the receiver will capture any message
> (even not targetted for it)" is incorrect.
>
> I guess I still don't understand your question...?
>
>
> On Jul 14, 2011, at 9:17 PM, Mudassar Majeed wrote:
>
> >
> > I know this, but when I compare status.MPI_SOURCE with myid, they
> are different. I guess you need to reconsider my question. The
> MPI_Recv function seems to capture message from the queue with some
> search parameters like source, tag etc. So in case the receiver does
> not know the sender and wants to receive only that message which was
> sent for this receiver. But when it captures with source as
> MPI_ANY_SOURCE and MPI_ANY_TAG, the receiver will capture any message
> (even not targetted for it).
> >
> > regards,
> > Mudassar
> >
> >
> > From: Jeff Squyres mailto:jsquy...@cisco.com> 
<mailto:jsquy...@cisco.com <mailto:jsquy...@cisco.com>>>
> > To: Mudassar Majeed <mailto:mudassar...@yahoo.com>
> <mailto:mudassar...@yahoo.com <mailto:mudassar...@yahoo.com>>>; Open 
MPI Users mailto:us...@open-mpi.org>

> <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>>
> > Sent: Friday, July 15, 2011 1:58 AM
> > Subject: Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.
> >
> > When you use MPI_ANY_SOURCE in a receive, the rank of the actual
> sender is passed back to you in the status.MPI_SOURCE.
> >
> > On Jul 14, 2011, at 7:55 PM, Mudassar Majeed wrote:
> >
> >

Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.

2011-07-15 Thread Terry Dontje



On 7/15/2011 12:49 PM, Mudassar Majeed wrote:


Yes, processes receive messages that were not sent to them. I am 
receiving the message with the following call


MPI_Recv(&recv_packet, 1, loadDatatype, MPI_ANY_SOURCE, MPI_TAG_LOAD, 
comm, &status);


and that was sent using the following call,

MPI_Ssend(&load_packet, 1, loadDatatype, rec_rank, MPI_TAG_LOAD, comm);

What problem it can have ?. All the parameters are correct, I have 
seen them by printf.  What I am thinking is that, the receive is done 
with MPI_ANY_SOURCE, so the process is getting any message (from any 
source). What should be done so that only that message is captured 
that had the destination as this process.


By virtue of MPI the MPI_Recv call should only return messages destined 
for that rank.  What makes you think that is not happening?  Can you 
make some sort of kernel of code that proves your theory that your 
MPI_Recv is receiving another rank's message?  If you can and then post 
that code maybe we'll be able to figure out what the issue is.


Right now, it seems we are at a deadlock of you claiming something is 
happening that really cannot be happening.  So unless we have more than 
a broad description of the problem it is going to be nearly impossible 
for us to tell you what is wrong.


--td

regards,
Mudassar

Date: Fri, 15 Jul 2011 07:04:34 -0400
From: Terry Dontje <mailto:terry.don...@oracle.com>>

Subject: Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.
To: us...@open-mpi.org <mailto:us...@open-mpi.org>
Message-ID: <4e201ec2@oracle.com <mailto:4e201ec2@oracle.com>>
Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"

Well MPI_Recv does give you the message that was sent specifically to
the rank calling it by any of the processes in the communicator.  If you
think the message you received should have gone to another rank then
there is a bug somewhere.  I would start by either adding debugging
printf's to your code to trace the messages.  Or narrowing down the
code to a small kernel such that you can prove to yourself that MPI is
working the way it should and if not you can show us where it is going
wrong.

--td

On 7/15/2011 6:51 AM, Mudassar Majeed wrote:
> I get the sender's rank in status.MPI_SOURCE, but it is different than
> expected. I need to receive that message which was sent to me, not any
> message.
>
> regards,
>
> Date: Fri, 15 Jul 2011 06:33:41 -0400
> From: Terry Dontje <mailto:terry.don...@oracle.com>

> <mailto:terry.don...@oracle.com <mailto:terry.don...@oracle.com>>>
> Subject: Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.
> To: us...@open-mpi.org <mailto:us...@open-mpi.org> 
<mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>
> Message-ID: <4e201785.6010...@oracle.com 
<mailto:4e201785.6010...@oracle.com>
> <mailto:4e201785.6010...@oracle.com 
<mailto:4e201785.6010...@oracle.com>>>

> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>
> Mudassar,
>
> You can do what you are asking.  The receiver uses MPI_ANY_SOURCE for
> the source rank value and when you receive a message the
> status.MPI_SOURCE will contain the rank of the actual sender not the
> receiver's rank.  If you are not seeing that then there is a bug
> somewhere.
>
> --td
>
> On 7/14/2011 9:54 PM, Mudassar Majeed wrote:
> > Friend,
> >  I can not specify the rank of the sender. Because only
> > the sender knows to which receiver the message is to be sent. The
> > receiver does not know from which sender the message will come. I am
> > trying to do a research work on load balancing in MPI application
> > where load is redistributed, so in that I require a receiver to
> > receive a load value from a sender that it does not know. On the other
> > hand, the sender actually calculates, to which receiver this load
> > value should be sent. So for this, I want sender to send a message
> > containing the load to a receiver, but receiver does not know from
> > which sender the message will come. See, it is like send receiver in
> > DATAGRAM sockets. The receiver, receives the message on the IP and
> > port, the message which was directed for it. I want to have same
> > behavior. But it seems that it is not possible in MPI. Isn't it?
> >
> > regards,
> > Mudassar
> >
> > 

> > *From:* Jeff Squyres <mailto:jsquy...@cisco.com> <mailto:jsquy...@cisco.com 
<mailto:jsquy...@cisco.com>>>
> > *To:* Mudassar Majeed <mailto:mudassar...@yahoo.com>

> <mailto:mudassar...@yahoo.com <mailto:mudassar...@

Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.

2011-07-15 Thread Terry Dontje



On 7/15/2011 2:35 PM, Mudassar Majeed wrote:


Here is the code

if( (is_receiver == 1) && (is_sender != 1) )
{
printf("\nP%d >> Receiver only ...!!", myid);
printf("\n");
MPI_Recv(&recv_packet, 1, loadDatatype, MPI_ANY_SOURCE, 
MPI_TAG_LOAD, comm, &status);

printf("\nP%d >> Received from P%d", myid, status.MPI_SOURCE);
printf("\n");
}
else if( (is_sender == 1) && (is_receiver != 1) )
{
load_packet.rank = *id;
load_packet.ld = load;
printf("\nP%d >> Sender only ...!! P%d", myid, rec_rank);
printf("\n");
MPI_Ssend(&load_packet, 1, loadDatatype, rec_rank, 
MPI_TAG_LOAD, comm);

}
else if( (is_receiver == 1) && (is_sender == 1) )
{
load_packet.rank = *id;
load_packet.ld = load;
printf("\nP%d >> Both ...!! P%d", myid, rec_rank);
printf("\n");
MPI_Sendrecv(&load_packet, 1, loadDatatype, rec_rank, 
MPI_TAG_LOAD,
&recv_packet, 1, loadDatatype, MPI_ANY_SOURCE, MPI_TAG_LOAD, comm, 
&status);

printf("\nP%d >> Received from P%d", myid, status.MPI_SOURCE);
printf("\n");
}

A process can be a message sender, or receiver or both. There are 16 
ranks. "rec_rank" contains the rank of the receiver. It is displayed 
before the message is sent.
Every sender displays this "rec_rank" and it should correctly. But on 
the receiver sides, status.MPI_SOURCE is displayed (after receiving 
message), but the value

is not matching with the expected sender's rank.
Sorry, but I still don't see how you are detecting the mismatch.  I 
assume load_packet_rank some how relates to load_packet.  But why are 
you setting it to *id instead of myid?  Also on the receive side I see 
no place where you pull out the rank from the recv_packet to compare 
with status.MPI_SOURCE.


I did not understand about kernel that you were talking about.

A "kernel" that I am talking about is a small piece of code someone can 
build and run to see the problem.
See the code is very clear and it sends the message to "rec_rank" that 
was displayed before sending the message. But on the receiver side the 
MPI_SOURCE comes to be wrong.
This shows to me that messages on the receiving sides are captured on 
the basis of MPI_ANY_SOURCE, that seems like it does not see the 
destination of message while capturing it from message queue of the 
MPI system.


regards,
Mudassar


*From:* Terry Dontje 
*To:* Mudassar Majeed 
*Cc:* "us...@open-mpi.org" 
*Sent:* Friday, July 15, 2011 7:10 PM
*Subject:* Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.



On 7/15/2011 12:49 PM, Mudassar Majeed wrote:


Yes, processes receive messages that were not sent to them. I am 
receiving the message with the following call


MPI_Recv(&recv_packet, 1, loadDatatype, MPI_ANY_SOURCE, MPI_TAG_LOAD, 
comm, &status);


and that was sent using the following call,

MPI_Ssend(&load_packet, 1, loadDatatype, rec_rank, MPI_TAG_LOAD, comm);

What problem it can have ?. All the parameters are correct, I have 
seen them by printf.  What I am thinking is that, the receive is done 
with MPI_ANY_SOURCE, so the process is getting any message (from any 
source). What should be done so that only that message is captured 
that had the destination as this process.


By virtue of MPI the MPI_Recv call should only return messages 
destined for that rank.  What makes you think that is not happening?  
Can you make some sort of kernel of code that proves your theory that 
your MPI_Recv is receiving another rank's message?  If you can and 
then post that code maybe we'll be able to figure out what the issue is.


Right now, it seems we are at a deadlock of you claiming something is 
happening that really cannot be happening.  So unless we have more 
than a broad description of the problem it is going to be nearly 
impossible for us to tell you what is wrong.


--td

regards,
Mudassar

Date: Fri, 15 Jul 2011 07:04:34 -0400
From: Terry Dontje <mailto:terry.don...@oracle.com>>

Subject: Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.
To: us...@open-mpi.org <mailto:us...@open-mpi.org>
Message-ID: <4e201ec2@oracle.com <mailto:4e201ec2@oracle.com>>
Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"

Well MPI_Recv does give you the message that was sent specifically to
the rank calling it by any of the processes in the communicator.  If you
think the message you received should have gone to another rank then
there is a bug somewhere.  I would start by either adding debugging
printf's to your code to trace the messages.  Or na

Re: [OMPI users] Does Oracle Cluster Tools aka Sun's MPI work with LDAP?

2011-07-15 Thread Terry Dontje



On 7/15/2011 1:46 PM, Paul Kapinos wrote:

Hi OpenMPI volks (and Oracle/Sun experts),

we have a problem with Sun's MPI (Cluster Tools 8.2.x) on a part of 
our cluster. In the part of the cluster where LDAP is activated, the 
mpiexec  does not try to spawn tasks on remote nodes at all, but exits 
with an error message alike below. If 'strace -f' the mpiexec, no exec 
of "ssh" can be found at all. Wondering, mpiexec tries to look into 
/etc/passwd (where user is not in, because using LDAP!).



Note this is an area that should be no different than from stock Open MPI.
I would suspect that the message might be coming from ssh.  I wouldn't 
suspect mpiexec would be looking into /etc/passwd at all, why would it 
need to.  It should just be using ssh.  Can you manually ssh to the same 
node?
On the old part of the cluster, where NIS is used as the 
autentification method, Sun MPI runs very fine.


So, is Suns MPI compatible with LDAP autotentification method at all?


In as far as whatever launcher you use is compatible with LDAP.

Best wishes,

Paul


P.S. in both parts if the cluster, me (login marked as x here) can 
login to any node by ssh without need to type the password.




-- 


The user (x) is unknown to the system (i.e. there is no corresponding
entry in the password file). Please contact your system administrator
for a fix.
-- 

[cluster-beta.rz.RWTH-Aachen.DE:31535] [[57885,0],0] ORTE_ERROR_LOG: 
Fatal in file plm_rsh_module.c at line 1058
-- 





___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.

2011-07-15 Thread Terry Dontje
I have to agree with Jeff, we really need a complete program to really 
debug this.  Note, without really seeing what the structures look like 
it is hard to determine if maybe there is some type of structure 
mismatch going between recv_packet and load_packet.  Also the output you 
show seems incomplete in that not all data transfers are being shown so 
it is kind of hard to determine if packets are possibly being dropped or 
what.


I agree the output looks suspicious but it still leaves a lot to 
interpretation without really seeing a complete code.


Sorry,

--td

On 7/15/2011 3:44 PM, Jeff Squyres wrote:

Can you write this up in a small, complete program that shows the problem, and 
that we can compile and run?


On Jul 15, 2011, at 3:36 PM, Mudassar Majeed wrote:


*id is same as myid

I am comparing the results by seeing the printed messages, given by the 
printfs

the recv_packet.rank is the rank of the sender that should be equal to 
status.MPI_SOURCE but it is not.

I have updated the code a little bit, here is it.

if( (is_receiver == 1)&&  (is_sender != 1) )
 {
 printf("\nP%d>>  Receiver only ...!!", myid);
 printf("\n");
 MPI_Recv(&recv_packet, 1, loadDatatype, MPI_ANY_SOURCE, MPI_TAG_LOAD, 
comm,&status);
 printf("\nP%d>>  Received from P%d, packet contains rank: %d", myid, 
status.MPI_SOURCE, recv_packet.rank);
 printf("\n");
 }
 else if( (is_sender == 1)&&  (is_receiver != 1) )
 {
 load_packet.rank = myid;
 load_packet.ld = load;
 printf("\nP%d>>  Sender only ...!! P%d", myid, rec_rank);
 printf("\n");
 MPI_Ssend(&load_packet, 1, loadDatatype, rec_rank, MPI_TAG_LOAD, comm);
 }
 else if( (is_receiver == 1)&&  (is_sender == 1) )
 {
 load_packet.rank = myid;
 load_packet.ld = load;
 printf("\nP%d>>  Both ...!! P%d", myid, rec_rank);
 printf("\n");
 MPI_Sendrecv(&load_packet, 1, loadDatatype, rec_rank, MPI_TAG_LOAD,
  &recv_packet, 1, loadDatatype, MPI_ANY_SOURCE, MPI_TAG_LOAD, 
comm,&status);
 printf("\nP%d>>  Received from P%d, packet contains rank: %d", myid, 
status.MPI_SOURCE, recv_packet.rank);
 printf("\n");
 }

here is the output

P11>>  Sender only ...!! P2

P14>>  Sender only ...!! P6

P15>>  Neither ...!!

P15>>  I could reach here ...!!

P8>>  Neither ...!!

P8>>  I could reach here ...!!

P1>>  Receiver only ...!!

P9>>  Sender only ...!! P0

P2>>  Receiver only ...!!


P10>>  Sender only ...!! P1

P3>>  Receiver only ...!!

P3>>  Received from P13, packet contains rank: 14


P0>>  Receiver only ...!!

P0>>  Received from P3, packet contains rank: 9

P4>>  Receiver only ...!!

P12>>  Neither ...!!

P12>>  I could reach here ...!!

P5>>  Both ...!! P3

P13>>  Sender only ...!! P4

P13>>  I could reach here ...!!

P6>>  Both ...!! P5

P7>>  Neither ...!!

P7>>  I could reach here ...!!

P14>>  I could reach here ...!!

P1>>  Received from P7, packet contains rank: 11

P1>>  I could reach here ...!!

P9>>  I could reach here ...!!
P2>>  Received from P11, packet contains rank: 13

P2>>  I could reach here ...!!

P0>>  I could reach here ...!!

P11>>  I could reach here ...!!
P3>>  I could reach here ...!!


regards,
Mudassar

From: Terry Dontje
To: Mudassar Majeed
Cc: "us...@open-mpi.org"
Sent: Friday, July 15, 2011 9:06 PM
Subject: Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.



On 7/15/2011 2:35 PM, Mudassar Majeed wrote:

Here is the code

 if( (is_receiver == 1)&&  (is_sender != 1) )
 {
 printf("\nP%d>>  Receiver only ...!!", myid);
 printf("\n");
 MPI_Recv(&recv_packet, 1, loadDatatype, MPI_ANY_SOURCE, MPI_TAG_LOAD, 
comm,&status);
 printf("\nP%d>>  Received from P%d", myid, status.MPI_SOURCE);
 printf("\n");
 }
 else if( (is_sender == 1)&&  (is_receiver != 1) )
 {
 load_packet.rank = *id;
 load_packet.ld = load;
 printf("\nP%d>>  Sender only ...!! P%d", myid, rec_rank);
 printf("\n");
 MPI_Ssend(&load_packet, 1, loadDatatype, rec_rank, MPI_TAG_LOAD, comm);
 }
 else if( (is_receiver == 1)&&  (is_sender == 1) )
 {
 load_packet.rank = *id;
 load_packet.ld = load;
 printf("\nP%d>>  Both ...!! P%d", myid, rec_rank);
 printf("\n");
 MPI_Sendrecv(&load_packet, 1, loadDatatype, rec_rank, MPI_TAG_LOAD,
  &recv_p

Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.

2011-07-17 Thread Terry Dontje
Terrific!!! Glad you found it, otherwise it would have been quite a 
puzzling issue to debug in OMPI.


--td

On 7/16/2011 9:36 AM, Mudassar Majeed wrote:


Thanks to all of your friends. It was my mistake. Actually the problem 
was somewhere else very far away. When I made a separate file for you, 
it was working fine. Then I understood that the problem is somewhere 
else. I found that problem. Thanks to all of you people.


regards,
Mudassar


*From:* Terry Dontje 
*To:* Jeff Squyres 
*Cc:* Mudassar Majeed ; Open MPI Users 


*Sent:* Saturday, July 16, 2011 5:25 AM
*Subject:* Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.

I have to agree with Jeff, we really need a complete program to really 
debug this.  Note, without really seeing what the structures look like 
it is hard to determine if maybe there is some type of structure 
mismatch going between recv_packet and load_packet.  Also the output 
you show seems incomplete in that not all data transfers are being 
shown so it is kind of hard to determine if packets are possibly being 
dropped or what.


I agree the output looks suspicious but it still leaves a lot to 
interpretation without really seeing a complete code.


Sorry,

--td

On 7/15/2011 3:44 PM, Jeff Squyres wrote:

Can you write this up in a small, complete program that shows the problem, and 
that we can compile and run?


On Jul 15, 2011, at 3:36 PM, Mudassar Majeed wrote:


*id is same as myid

I am comparing the results by seeing the printed messages, given by the 
printfs

the recv_packet.rank is the rank of the sender that should be equal to 
status.MPI_SOURCE but it is not.

I have updated the code a little bit, here is it.

if( (is_receiver == 1)&&  (is_sender != 1) )
 {
 printf("\nP%d>>  Receiver only ...!!", myid);
 printf("\n");
 MPI_Recv(&recv_packet, 1, loadDatatype, MPI_ANY_SOURCE, MPI_TAG_LOAD, 
comm,&status);
 printf("\nP%d>>  Received from P%d, packet contains rank: %d", myid, 
status.MPI_SOURCE, recv_packet.rank);
 printf("\n");
 }
 else if( (is_sender == 1)&&  (is_receiver != 1) )
 {
 load_packet.rank = myid;
 load_packet.ld = load;
 printf("\nP%d>>  Sender only ...!! P%d", myid, rec_rank);
 printf("\n");
 MPI_Ssend(&load_packet, 1, loadDatatype, rec_rank, MPI_TAG_LOAD, comm);
 }
 else if( (is_receiver == 1)&&  (is_sender == 1) )
 {
 load_packet.rank = myid;
 load_packet.ld = load;
 printf("\nP%d>>  Both ...!! P%d", myid, rec_rank);
 printf("\n");
 MPI_Sendrecv(&load_packet, 1, loadDatatype, rec_rank, MPI_TAG_LOAD,
  &recv_packet, 1, loadDatatype, MPI_ANY_SOURCE, MPI_TAG_LOAD, 
comm,&status);
 printf("\nP%d>>  Received from P%d, packet contains rank: %d", myid, 
status.MPI_SOURCE, recv_packet.rank);
 printf("\n");
 }

here is the output

P11>>  Sender only ...!! P2

P14>>  Sender only ...!! P6

P15>>  Neither ...!!

P15>>  I could reach here ...!!

P8>>  Neither ...!!

P8>>  I could reach here ...!!

P1>>  Receiver only ...!!

P9>>  Sender only ...!! P0

P2>>  Receiver only ...!!


P10>>  Sender only ...!! P1

P3>>  Receiver only ...!!

P3>>  Received from P13, packet contains rank: 14


P0>>  Receiver only ...!!

P0>>  Received from P3, packet contains rank: 9

P4>>  Receiver only ...!!

P12>>  Neither ...!!

P12>>  I could reach here ...!!

P5>>  Both ...!! P3

P13>>  Sender only ...!! P4

P13>>  I could reach here ...!!

P6>>  Both ...!! P5

P7>>  Neither ...!!

P7>>  I could reach here ...!!

P14>>  I could reach here ...!!

P1>>  Received from P7, packet contains rank: 11

P1>>  I could reach here ...!!

P9>>  I could reach here ...!!
P2>>  Received from P11, packet contains rank: 13

P2>>  I could reach here ...!!

P0>>  I could reach here ...!!

P11>>  I could reach here ...!!
P3>>  I could reach here ...!!


regards,
Mudassar

From: Terry Dontje  <mailto:terry.don...@oracle.com>
To: Mudassar Majeed  <mailto:mudassar...@yahoo.com>
Cc:"us...@open-mpi.org"  <mailto:us...@open-mpi.org>
<mailto:us...@open-mpi.org>
Sent: Friday, July 15, 2011 9:06 PM
Subject: Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.



On 7/15/2011 2:35 PM, Mudassar Majeed wrote:

Here is the code

 if( (is_receiver == 1)&&  (is_sender != 1) )
 {
 printf("\nP%d>>  Receiver only ...!!", myid);
 printf("\n");
 MPI_Recv(&recv_packet, 1, loadDatatype, M

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread TERRY DONTJE

Some more info would be nice like:
-What version of ompi are you using
-What type of machine and os are you running on
-What does the machine file look like
-Is there a stack trace left behind by the pid that seg faulted?

--td

On 10/25/2011 8:07 AM, Mouhamad Al-Sayed-Ali wrote:

Hello,

I have tried to run the executable "wrf.exe", using

  mpirun -machinefile /tmp/108388.1.par2/machines -np 4 wrf.exe

but, I've got the following error:

-- 

mpirun noticed that process rank 1 with PID 9942 on node 
part031.u-bourgogne.fr exited on signal 11 (Segmentation fault).
-- 


   11.54s real 6.03s user 0.32s system
Starter(9908): Return code=139
Starter end(9908)




Thanks for your help


Mouhamad Alsayed


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread TERRY DONTJE

Can you run wrf successfully on one node?
Can you run a simple code across your two nodes?  I would try hostname 
then some simple MPI program like the ring example.


--td

On 10/25/2011 9:05 AM, Mouhamad Al-Sayed-Ali wrote:

hello,


-What version of ompi are you using

  I am using ompi version 1.4.1-1 compiled with gcc 4.5


-What type of machine and os are you running on

   I'm using linux machine 64 bits.


-What does the machine file look like

  part033
  part033
  part031
  part031


-Is there a stack trace left behind by the pid that seg faulted?

  No, there is no stack trace


Thanks for your help

Mouhamad Alsayed


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread TERRY DONTJE

This looks more like a seg fault in wrf and not OMPI.

Sorry not much I can do here to help you.

--td

On 10/25/2011 9:53 AM, Mouhamad Al-Sayed-Ali wrote:

Hi again,

 This is exactly the error I have:


taskid: 0 hostname: part034.u-bourgogne.fr
[part034:21443] *** Process received signal ***
[part034:21443] Signal: Segmentation fault (11)
[part034:21443] Signal code: Address not mapped (1)
[part034:21443] Failing at address: 0xfffe01eeb340
[part034:21443] [ 0] /lib64/libpthread.so.0 [0x3612c0de70]
[part034:21443] [ 1] wrf.exe(__module_ra_rrtm_MOD_taugb3+0x418) 
[0x11cc9d8]
[part034:21443] [ 2] wrf.exe(__module_ra_rrtm_MOD_gasabs+0x260) 
[0x11cfca0]

[part034:21443] [ 3] wrf.exe(__module_ra_rrtm_MOD_rrtm+0xb31) [0x11e6e41]
[part034:21443] [ 4] wrf.exe(__module_ra_rrtm_MOD_rrtmlwrad+0x25ec) 
[0x11e9bcc]
[part034:21443] [ 5] 
wrf.exe(__module_radiation_driver_MOD_radiation_driver+0xe573) [0xcc4ed3]
[part034:21443] [ 6] 
wrf.exe(__module_first_rk_step_part1_MOD_first_rk_step_part1+0x40c5) 
[0xe0e4f5]

[part034:21443] [ 7] wrf.exe(solve_em_+0x22e58) [0x9b45c8]
[part034:21443] [ 8] wrf.exe(solve_interface_+0x80a) [0x902dda]
[part034:21443] [ 9] wrf.exe(__module_integrate_MOD_integrate+0x236) 
[0x4b2c4a]
[part034:21443] [10] wrf.exe(__module_wrf_top_MOD_wrf_run+0x24) 
[0x47a924]

[part034:21443] [11] wrf.exe(main+0x41) [0x4794d1]
[part034:21443] [12] /lib64/libc.so.6(__libc_start_main+0xf4) 
[0x361201d8b4]

[part034:21443] [13] wrf.exe [0x4793c9]
[part034:21443] *** End of error message ***
---

Mouhamad


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] Changing plm_rsh_agent system wide

2011-10-26 Thread TERRY DONTJE

I am using prefix configuration so no it does not exist in /usr.

--td

On 10/26/2011 10:44 AM, Ralph Castain wrote:

Did the version you are running get installed in /usr? Sounds like you are 
picking up a different version when running a command - i.e., that your PATH is 
finding a different installation than the one in /usr.


On Oct 26, 2011, at 3:11 AM, Patrick Begou wrote:


I need to change system wide how OpenMPI launch the jobs on the nodes of my 
cluster.

Setting:
export OMPI_MCA_plm_rsh_agent=oarsh

works fine but I would like this config to be the default with OpenMPI. I've 
read several threads (discussions, FAQ) about this but none of the provided 
solutions seams to work.

I have two files:
/usr/lib/openmpi/1.4-gcc/etc/openmpi-mca-params.conf
/usr/lib64/openmpi/1.4-gcc/etc/openmpi-mca-params.conf

In these files I've set various flavor of the syntax (only one at a time, and 
the same in each file of course!):
test 1) plm_rsh_agent = oarsh
test 2) pls_rsh_agent = oarsh
test 3) orte_rsh_agent = oarsh

But each time when I run "ompi_info --param plm rsh" I get:
MCA plm: parameter "plm_rsh_agent" (current value: "ssh : rsh", data source: 
default value, synonyms:
  pls_rsh_agent)
  The command used to launch executables on remote nodes (typically either 
"ssh" or "rsh")

With the exported variable it works fine.
Any suggestion ?

The rpm package of my linux Rocks Cluster provides:
   Package: Open MPI root@build-x86-64 Distribution
   Open MPI: 1.4.3
   Open MPI SVN revision: r23834
   Open MPI release date: Oct 05, 2010

Thanks

Patrick



--
===
|  Equipe M.O.S.T. | http://most.hmg.inpg.fr  |
|  Patrick BEGOU   |      |
|  LEGI| mailto:patrick.be...@hmg.inpg.fr |
|  BP 53 X | Tel 04 76 82 51 35   |
|  38041 GRENOBLE CEDEX| Fax 04 76 82 52 71   |
===

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] Changing plm_rsh_agent system wide

2011-10-26 Thread TERRY DONTJE

Sorry please disregard my reply to this email.

:-)

--td

On 10/26/2011 10:44 AM, Ralph Castain wrote:

Did the version you are running get installed in /usr? Sounds like you are 
picking up a different version when running a command - i.e., that your PATH is 
finding a different installation than the one in /usr.


On Oct 26, 2011, at 3:11 AM, Patrick Begou wrote:


I need to change system wide how OpenMPI launch the jobs on the nodes of my 
cluster.

Setting:
export OMPI_MCA_plm_rsh_agent=oarsh

works fine but I would like this config to be the default with OpenMPI. I've 
read several threads (discussions, FAQ) about this but none of the provided 
solutions seams to work.

I have two files:
/usr/lib/openmpi/1.4-gcc/etc/openmpi-mca-params.conf
/usr/lib64/openmpi/1.4-gcc/etc/openmpi-mca-params.conf

In these files I've set various flavor of the syntax (only one at a time, and 
the same in each file of course!):
test 1) plm_rsh_agent = oarsh
test 2) pls_rsh_agent = oarsh
test 3) orte_rsh_agent = oarsh

But each time when I run "ompi_info --param plm rsh" I get:
MCA plm: parameter "plm_rsh_agent" (current value: "ssh : rsh", data source: 
default value, synonyms:
  pls_rsh_agent)
  The command used to launch executables on remote nodes (typically either 
"ssh" or "rsh")

With the exported variable it works fine.
Any suggestion ?

The rpm package of my linux Rocks Cluster provides:
   Package: Open MPI root@build-x86-64 Distribution
   Open MPI: 1.4.3
   Open MPI SVN revision: r23834
   Open MPI release date: Oct 05, 2010

Thanks

Patrick



--
===
|  Equipe M.O.S.T. | http://most.hmg.inpg.fr  |
|  Patrick BEGOU   |      |
|  LEGI| mailto:patrick.be...@hmg.inpg.fr |
|  BP 53 X | Tel 04 76 82 51 35   |
|  38041 GRENOBLE CEDEX| Fax 04 76 82 52 71   |
===

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage

2011-11-04 Thread TERRY DONTJE
David, are you saying your jobs consistently leave behind session files 
after the job exits?  It really shouldn't even in the case when a job 
aborts, I thought, mpirun took great pains to cleanup after itself.
Can you tell us what version of OMPI you are running with?  I think I 
could see kill -9 of mpirun and processes below would cause turds to be 
left behind.


--td

On 11/4/2011 2:37 AM, David Turner wrote:

% df /tmp
Filesystem   1K-blocks  Used Available Use% Mounted on
- 12330084822848  11507236   7% /
% df /
Filesystem   1K-blocks  Used Available Use% Mounted on
- 12330084822848  11507236   7% /

That works out to 11GB.  But...

The compute nodes have 24GB.  Freshly booted, about 3.2GB is
consumed by the kernel, various services, and the root file system.
At this time, usage of /tmp is essentially nil.

We set user memory limits to 20GB.

I would imagine that the size of the session directories depends on a
number of factors; perhaps the developers can comment on that.  I have
only seen total sizes in the 10s of MBs on our 8-node, 24GB nodes.

As long as they're removed after each job, they don't really compete
with the application for available memory.

On 11/3/11 8:40 PM, Ed Blosch wrote:

Thanks very much, exactly what I wanted to hear. How big is /tmp?

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of David Turner
Sent: Thursday, November 03, 2011 6:36 PM
To: us...@open-mpi.org
Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node 
/tmp

for OpenMPI usage

I'm not a systems guy, but I'll pitch in anyway.  On our cluster,
all the compute nodes are completely diskless.  The root file system,
including /tmp, resides in memory (ramdisk).  OpenMPI puts these
session directories therein.  All our jobs run through a batch
system (torque).  At the conclusion of each batch job, an epilogue
process runs that removes all files belonging to the owner of the
current batch job from /tmp (and also looks for and kills orphan
processes belonging to the user).  This epilogue had to written
by our systems staff.

I believe this is a fairly common configuration for diskless
clusters.

On 11/3/11 4:09 PM, Blosch, Edwin L wrote:
Thanks for the help.  A couple follow-up-questions, maybe this 
starts to

go outside OpenMPI:


What's wrong with using /dev/shm?  I think you said earlier in this 
thread

that this was not a safe place.


If the NFS-mount point is moved from /tmp to /work, would a /tmp 
magically
appear in the filesystem for a stateless node?  How big would it be, 
given
that there is no local disk, right?  That may be something I have to 
ask the

vendor, which I've tried, but they don't quite seem to get the question.


Thanks




-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On

Behalf Of Ralph Castain

Sent: Thursday, November 03, 2011 5:22 PM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less 
node /tmp

for OpenMPI usage



On Nov 3, 2011, at 2:55 PM, Blosch, Edwin L wrote:

I might be missing something here. Is there a side-effect or 
performance
loss if you don't use the sm btl?  Why would it exist if there is a 
wholly

equivalent alternative?  What happens to traffic that is intended for
another process on the same node?


There is a definite performance impact, and we wouldn't recommend doing

what Eugene suggested if you care about performance.


The correct solution here is get your sys admin to make /tmp local. 
Making
/tmp NFS mounted across multiple nodes is a major "faux pas" in the 
Linux

world - it should never be done, for the reasons stated by Jeff.





Thanks


-Original Message-
From: users-boun...@open-mpi.org 
[mailto:users-boun...@open-mpi.org] On

Behalf Of Eugene Loh

Sent: Thursday, November 03, 2011 1:23 PM
To: us...@open-mpi.org
Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node

/tmp for OpenMPI usage


Right.  Actually "--mca btl ^sm".  (Was missing "btl".)

On 11/3/2011 11:19 AM, Blosch, Edwin L wrote:

I don't tell OpenMPI what BTLs to use. The default uses sm and puts a

session file on /tmp, which is NFS-mounted and thus not a good choice.


Are you suggesting something like --mca ^sm?


-Original Message-
From: users-boun...@open-mpi.org 
[mailto:users-boun...@open-mpi.org] On

Behalf Of Eugene Loh

Sent: Thursday, November 03, 2011 12:54 PM
To: us...@open-mpi.org
Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node

/tmp for OpenMPI usage


I've not been following closely.  Why must one use shared-memory
communications?  How about using other BTLs in a "loopback" fashion?
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
___
users mailing list
us...@o

  1   2   >