Re: [OMPI users] mca_pml_ob1_send blocks

2009-09-14 Thread Shaun Jackman
Hi Jeff, Jeff Squyres wrote: On Sep 8, 2009, at 1:06 PM, Shaun Jackman wrote: My INBOX has been a disaster recently. Please ping me repeatedly if you need quicker replies (sorry! :-( ). (btw, should this really be on the devel list, not the user list?) It's tending that way. I'll keep the

Re: [OMPI users] mca_pml_ob1_send blocks

2009-09-12 Thread Jeff Squyres
On Sep 8, 2009, at 1:06 PM, Shaun Jackman wrote: Hi Jeff, My INBOX has been a disaster recently. Please ping me repeatedly if you need quicker replies (sorry! :-( ). (btw, should this really be on the devel list, not the user list?) I can see one sort of ugly scenario unfolding in my h

Re: [OMPI users] mca_pml_ob1_send blocks

2009-09-08 Thread Shaun Jackman
Jeff Squyres wrote: ... Two questions then... 1. If the request has already completed, does it mean that since opal_progress() is not called, no further progress is made? Correct. It's a latency thing; if your request has already completed, we just tell you without further delay (i.e., wit

Re: [OMPI users] mca_pml_ob1_send blocks

2009-09-01 Thread Jeff Squyres
Sorry for the delay in replying... On Sep 1, 2009, at 1:11 AM, Shaun Jackman wrote: > Looking at the source code of MPI_Request_get_status, it... > calls OPAL_CR_NOOP_PROGRESS() > returns true in *flag if request->req_complete > calls opal_progress() > returns false in *flag Keep in mind th

Re: [OMPI users] mca_pml_ob1_send blocks

2009-08-31 Thread Shaun Jackman
Shaun Jackman wrote: Jeff Squyres wrote: On Aug 26, 2009, at 10:38 AM, Jeff Squyres (jsquyres) wrote: Yes, this could cause blocking. Specifically, the receiver may not advance any other senders until the matching Irecv is posted and is able to make progress. I should clarify something else h

Re: [OMPI users] mca_pml_ob1_send blocks

2009-08-31 Thread Shaun Jackman
Jeff Squyres wrote: On Aug 26, 2009, at 10:38 AM, Jeff Squyres (jsquyres) wrote: Yes, this could cause blocking. Specifically, the receiver may not advance any other senders until the matching Irecv is posted and is able to make progress. I should clarify something else here -- for long mess

Re: [OMPI users] mca_pml_ob1_send blocks

2009-08-27 Thread Shaun Jackman
Jeff Squyres wrote: On Aug 26, 2009, at 10:38 AM, Jeff Squyres (jsquyres) wrote: Yes, this could cause blocking. Specifically, the receiver may not advance any other senders until the matching Irecv is posted and is able to make progress. I should clarify something else here -- for long mess

Re: [OMPI users] mca_pml_ob1_send blocks

2009-08-26 Thread Jeff Squyres
On Aug 26, 2009, at 10:38 AM, Jeff Squyres (jsquyres) wrote: Yes, this could cause blocking. Specifically, the receiver may not advance any other senders until the matching Irecv is posted and is able to make progress. I should clarify something else here -- for long messages where the pi

Re: [OMPI users] mca_pml_ob1_send blocks

2009-08-26 Thread Jeff Squyres
On Aug 25, 2009, at 6:51 PM, Shaun Jackman wrote: The receiver posts a single MPI_Irecv in advance, and as soon as it's received a message it posts a new MPI_Irecv. However, there are multiple processes sending to the receiver, and only one MPI_Irecv posted. Yes, this could cause blocking

Re: [OMPI users] mca_pml_ob1_send blocks

2009-08-25 Thread Shaun Jackman
Jeff Squyres wrote: On Aug 24, 2009, at 2:18 PM, Shaun Jackman wrote: I'm seeing MPI_Send block in mca_pml_ob1_send. The packet is shorter than the eager transmit limit for shared memory (3300 bytes < 4096 bytes). I'm trying to determine if MPI_Send is blocking due to a deadlock. Will MPI_Send

Re: [OMPI users] mca_pml_ob1_send blocks

2009-08-25 Thread Jeff Squyres
On Aug 24, 2009, at 2:18 PM, Shaun Jackman wrote: I'm seeing MPI_Send block in mca_pml_ob1_send. The packet is shorter than the eager transmit limit for shared memory (3300 bytes < 4096 bytes). I'm trying to determine if MPI_Send is blocking due to a deadlock. Will MPI_Send block even when sendi

Re: [OMPI users] mca_pml_ob1_send blocks

2009-08-24 Thread Shaun Jackman
I neglected to include some pertinent information: I'm using Open MPI 1.3.2. Here's a backtrace: #0 0x002a95e6890c in epoll_wait () from /lib64/tls/libc.so.6 #1 0x002a9623a39c in epoll_dispatch () from /home/sjackman/arch/xhost/lib/libopen-pal.so.0 #2 0x002a96238f10 in opal_even