On Sun, 13 Nov 2005 17:53:40 -0700, Jeff Squyres <jsquy...@open-mpi.org>
wrote:
I can't believe I missed that, sorry. :-(
None of the btl's are capable of doing loopback communication except
"self." Hence, you really can't run "--mca btl foo" if your app ever
sends to itself -- you really need to run "--mca btl foo,self" at a
minimum.
This is not so much an optimization as it is a software engineering
decision; we didn't have to include the special send-to-self case in
any of the other btl components this way (i.e., less code, less complex
maintenance).
On Nov 13, 2005, at 7:12 PM, Brian Barrett wrote:
One other thing I noticed... You specify -mca btl openib. Try
specifying -mca btl openib,self. The self component is used for
"send to self" operations. This could be the cause of your failures.
Brian
On Nov 13, 2005, at 3:02 PM, Jeff Squyres wrote:
Troy --
Were you perchance using multiple processes per node? If so, we
literally just fixed some sm btl bugs that could have been affecting
you (they could have caused hangs). They're fixed in the nightly
snapshots from today (both trunk and v1.0): r8140. If you were using
the sm btl and multiple processes per node, could you try again?
As a matter of fact, yes; one process per CPU, each node having 2 CPUs.
If I change my machinefile to only use one process per node (leaving one
CPU idle), the problem dissapears. However, if I use two CPU's per node
(but the same number of overall processes -- meaning half the number of
nodes), I recieve the same error:
***
[0,1,0][btl_openib_endpoint.c:136:mca_btl_openib_endpoint_post_send] error
posting send request errno says Resource temporarily unavailable
[0,1,0][btl_openib_component.c:655:mca_btl_openib_component_progress]
error in posting pending send
***
This happens on both RC5 and RC6, with '-mca btl openib' or '-mca btl
openib,self'
On a positive note, I've now been able to complete the 'com' Presta
benchmark with GM (which I had previously been unable to do)
And informationally: I was using MX version 1.0.3. I have just installed
1.1.0, and I'll be checking that out presently.