On Thu, Aug 6, 2015 at 2:38 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Thu, Aug 6, 2015 at 10:10 AM, Antonin Houska <a...@cybertec.at> wrote: >> During my experiments with parallel workers I sometimes saw the "master" and >> worker process blocked. The master uses shm queue to send data to the worker, >> both sides nowait==false. I concluded that the following happened: >> >> The worker process set itself as a receiver on the queue after >> shm_mq_wait_internal() has completed its first check of "ptr", so this >> function left sender's procLatch in reset state. But before the procLatch was >> reset, the receiver still managed to read some data and set sender's >> procLatch >> to signal the reading, and eventually called its (receiver's) WaitLatch(). >> >> So sender has effectively missed the receiver's notification and called >> WaitLatch() too (if the receiver already waits on its latch, it does not help >> for sender to call shm_mq_notify_receiver(): receiver won't do anything >> because there's no new data in the queue). >> >> Below is my patch proposal. > > Another good catch. However, I would prefer to fix this without > introducing a "continue" as I think that will make the control flow > clearer. Therefore, I propose the attached variant of your idea.
Err, that doesn't work at all. Have a look at this version instead. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c index e765cea..0e60dbc 100644 --- a/src/backend/storage/ipc/shm_mq.c +++ b/src/backend/storage/ipc/shm_mq.c @@ -777,33 +777,37 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data, return SHM_MQ_DETACHED; } - if (available == 0) + if (available == 0 && !mqh->mqh_counterparty_attached) { - shm_mq_result res; - /* * The queue is full, so if the receiver isn't yet known to be * attached, we must wait for that to happen. */ - if (!mqh->mqh_counterparty_attached) + if (nowait) { - if (nowait) + if (shm_mq_get_receiver(mq) == NULL) { - if (shm_mq_get_receiver(mq) == NULL) - { - *bytes_written = sent; - return SHM_MQ_WOULD_BLOCK; - } - } - else if (!shm_mq_wait_internal(mq, &mq->mq_receiver, - mqh->mqh_handle)) - { - mq->mq_detached = true; *bytes_written = sent; - return SHM_MQ_DETACHED; + return SHM_MQ_WOULD_BLOCK; } - mqh->mqh_counterparty_attached = true; } + else if (!shm_mq_wait_internal(mq, &mq->mq_receiver, + mqh->mqh_handle)) + { + mq->mq_detached = true; + *bytes_written = sent; + return SHM_MQ_DETACHED; + } + mqh->mqh_counterparty_attached = true; + + /* + * The receiver may have read some data after attaching, so we + * must not wait without rechecking the queue state. + */ + } + else if (available == 0) + { + shm_mq_result res; /* Let the receiver know that we need them to read some data. */ res = shm_mq_notify_receiver(mq);
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers