On Wed, Jan 24, 2018 at 3:37 PM, Robert Haas wrote:
> Well, I've been resisting that approach from the very beginning of
> parallel query. Eventually, I hope that we're going to go in the
> direction of changing our mind about how many workers parallel
> operations use "on the fly". For example,
On Wed, Jan 24, 2018 at 5:52 PM, Peter Geoghegan wrote:
>> If we made the Gather node wait only for workers that actually reached
>> the Gather -- either by using a Barrier or by some other technique --
>> then this would be a lot less fragile. And the same kind of technique
>> would work for par
On Wed, Jan 24, 2018 at 12:05 PM, Robert Haas wrote:
> In Thomas's test case, he's using 4 workers, and if even one of those
> manages to start, then it'll probably do so after any fork failures
> have already occurred, and the error will be noticed when that process
> sends its first message to t
On Tue, Jan 23, 2018 at 9:45 PM, Amit Kapila wrote:
> Hmm, I think that case will be addressed because tuple queues can
> detect if the leader is not attached. It does in code path
> shm_mq_receive->shm_mq_counterparty_gone. In
> shm_mq_counterparty_gone, it can detect if the worker is gone by u
On Wed, Jan 24, 2018 at 1:57 AM, Thomas Munro
wrote:
> On Wed, Jan 24, 2018 at 5:25 PM, Thomas Munro
> wrote:
>> If there were some way for the postmaster to cause reason
>> PROCSIG_PARALLEL_MESSAGE to be set in the leader process instead of
>> just notification via kill(SIGUSR1) when it fails to
On Wed, Jan 24, 2018 at 5:25 PM, Thomas Munro
wrote:
> If there were some way for the postmaster to cause reason
> PROCSIG_PARALLEL_MESSAGE to be set in the leader process instead of
> just notification via kill(SIGUSR1) when it fails to fork a parallel
> worker, we'd get (1) for free in any latch
On Tue, Jan 23, 2018 at 9:38 PM, Amit Kapila wrote:
> On Wed, Jan 24, 2018 at 10:38 AM, Peter Geoghegan wrote:
>> The leader can go ahead and sort before calling something like a new
>> WaitForParallelWorkersToAttach() function (or even
>> WaitForParallelWorkersToFinish()). If we did add a
>> Wai
On Wed, Jan 24, 2018 at 10:38 AM, Peter Geoghegan wrote:
> On Tue, Jan 23, 2018 at 9:02 PM, Thomas Munro
> wrote:
>>> Yes, this is what I am trying to explain on parallel create index
>>> thread. I think there we need to either use
>>> WaitForParallelWorkersToFinish or WaitForParallelWorkersToAt
On Tue, Jan 23, 2018 at 9:02 PM, Thomas Munro
wrote:
>> Yes, this is what I am trying to explain on parallel create index
>> thread. I think there we need to either use
>> WaitForParallelWorkersToFinish or WaitForParallelWorkersToAttach (a
>> new API as proposed in that thread) if we don't want t
On Wed, Jan 24, 2018 at 5:43 PM, Amit Kapila wrote:
>> Hmm. Yeah. I can't seem to reach a stuck case and was probably just
>> confused and managed to confuse Robert too. If you make
>> fork_process() fail randomly (see attached), I see that there are a
>> couple of easily reachable failure mode
On Wed, Jan 24, 2018 at 9:55 AM, Thomas Munro
wrote:
> On Wed, Jan 24, 2018 at 3:45 PM, Amit Kapila wrote:
>> On Wed, Jan 24, 2018 at 2:35 AM, Robert Haas wrote:
>>> In the case of Gather, for example, we won't
>>> WaitForParallelWorkersToFinish() until we've read all the tuples. If
>>> there's
On Wed, Jan 24, 2018 at 10:03 AM, Peter Geoghegan wrote:
> On Tue, Jan 23, 2018 at 8:25 PM, Thomas Munro
> wrote:
>>> Hmm, I think that case will be addressed because tuple queues can
>>> detect if the leader is not attached. It does in code path
>>> shm_mq_receive->shm_mq_counterparty_gone. In
On Tue, Jan 23, 2018 at 8:25 PM, Thomas Munro
wrote:
>> Hmm, I think that case will be addressed because tuple queues can
>> detect if the leader is not attached. It does in code path
>> shm_mq_receive->shm_mq_counterparty_gone. In
>> shm_mq_counterparty_gone, it can detect if the worker is gone
On Wed, Jan 24, 2018 at 3:45 PM, Amit Kapila wrote:
> On Wed, Jan 24, 2018 at 2:35 AM, Robert Haas wrote:
>> In the case of Gather, for example, we won't
>> WaitForParallelWorkersToFinish() until we've read all the tuples. If
>> there's a tuple queue that does not yet have a sender, then we'll j
On Wed, Jan 24, 2018 at 2:35 AM, Robert Haas wrote:
> On Tue, Jan 23, 2018 at 11:15 AM, Robert Haas wrote:
>> On Mon, Jan 22, 2018 at 10:56 PM, Amit Kapila
>> wrote:
>>
>> OK. I've committed this patch and back-patched it to 9.6.
>> Back-patching to 9.5 didn't looks simple because nworkers_lau
On Tue, Jan 23, 2018 at 1:05 PM, Robert Haas wrote:
> I was just talking to Thomas, and one or the other of us realized that
> this doesn't completely fix the problem. I think that if a worker
> fails, even by some unorthodox mechanism like an abrupt proc_exit(1),
> after the point where it attac
On Tue, Jan 23, 2018 at 11:15 AM, Robert Haas wrote:
> On Mon, Jan 22, 2018 at 10:56 PM, Amit Kapila wrote:
>> It does turn such cases into error but only at the end when someone
>> calls WaitForParallelWorkersToFinish. If one tries to rely on it at
>> any other time, it won't work as I think is
On Mon, Jan 22, 2018 at 10:56 PM, Amit Kapila wrote:
> It does turn such cases into error but only at the end when someone
> calls WaitForParallelWorkersToFinish. If one tries to rely on it at
> any other time, it won't work as I think is the case for Parallel
> Create Index where Peter G. is try
On Mon, Jan 22, 2018 at 7:16 PM, Robert Haas wrote:
> On Fri, Jan 19, 2018 at 10:59 PM, Amit Kapila wrote:
>> The patch doesn't apply cleanly on the head, but after rebasing it, I
>> have reviewed and tested it and it seems to be working fine. Apart
>> from this specific issue, I think we should
On Fri, Jan 19, 2018 at 10:59 PM, Amit Kapila wrote:
> The patch doesn't apply cleanly on the head, but after rebasing it, I
> have reviewed and tested it and it seems to be working fine. Apart
> from this specific issue, I think we should consider making
> nworkers_launched reliable (at the very
On Thu, Jan 18, 2018 at 8:53 PM, Robert Haas wrote:
> On Thu, Dec 21, 2017 at 9:13 AM, Amit Kapila wrote:
>> I am not against using the way specific to parallel context layer as
>> described by you above. However, I was trying to see if there is
>> some general purpose solution as the low-impac
On Thu, Dec 21, 2017 at 9:13 AM, Amit Kapila wrote:
> I am not against using the way specific to parallel context layer as
> described by you above. However, I was trying to see if there is
> some general purpose solution as the low-impact way is not very
> straightforward. I think you can go a
On Thu, Dec 21, 2017 at 6:26 PM, Robert Haas wrote:
> On Thu, Dec 21, 2017 at 6:46 AM, Amit Kapila wrote:
>> What if we don't allow to reuse such slots till the backend/session
>> that has registered it performs unregister? Currently, we don't seem
>> to have an API corresponding to Register*Bac
On Thu, Dec 21, 2017 at 6:46 AM, Amit Kapila wrote:
> What if we don't allow to reuse such slots till the backend/session
> that has registered it performs unregister? Currently, we don't seem
> to have an API corresponding to Register*BackgroundWorker() which can
> be used to unregister, but may
On Thu, Dec 21, 2017 at 2:39 AM, Robert Haas wrote:
> On Tue, Dec 19, 2017 at 11:28 PM, Amit Kapila wrote:
>> That is not the main point I am bothered about. I am concerned that
>> the patch proposed by you can lead to hang if there is a crash (abrupt
>> failure like proc_exit(1)) after attachin
On Tue, Dec 19, 2017 at 11:28 PM, Amit Kapila wrote:
> That is not the main point I am bothered about. I am concerned that
> the patch proposed by you can lead to hang if there is a crash (abrupt
> failure like proc_exit(1)) after attaching to the error queue. This is
> explained in my email up t
On Tue, Dec 19, 2017 at 9:01 PM, Robert Haas wrote:
> On Tue, Dec 19, 2017 at 5:01 AM, Amit Kapila wrote:
>> I think it would have been much easier to fix this problem if we would
>> have some way to differentiate whether the worker has stopped
>> gracefully or not. Do you think it makes sense t
On Tue, Dec 19, 2017 at 5:01 AM, Amit Kapila wrote:
> I think it would have been much easier to fix this problem if we would
> have some way to differentiate whether the worker has stopped
> gracefully or not. Do you think it makes sense to introduce such a
> state in the background worker machin
On Thu, Dec 14, 2017 at 3:05 AM, Robert Haas wrote:
> On Wed, Dec 13, 2017 at 1:41 AM, Amit Kapila wrote:
>
>> This also doesn't appear to be completely safe. If we add
>> proc_exit(1) after attaching to error queue (say after
>> pq_set_parallel_master) in the worker, then it will lead to *hang*
On Wed, Dec 13, 2017 at 1:41 AM, Amit Kapila wrote:
> I have tried to reproduce this situation by adding error case in
> worker (just before worker returns success ('X' message)) and by
> having breakpoint in WaitForParallelWorkersToFinish. What, I noticed
> is that worker is not allowed to exit
On Wed, Dec 13, 2017 at 12:31 AM, Robert Haas wrote:
> On Mon, Dec 11, 2017 at 10:41 PM, Amit Kapila wrote:
>> In particular, if the worker crashed (say somebody forcefully killed
>> the worker), then I think it will lead to a restart of the server. So
>> not sure, adding anything about that in
On Mon, Dec 11, 2017 at 10:41 PM, Amit Kapila wrote:
> In particular, if the worker crashed (say somebody forcefully killed
> the worker), then I think it will lead to a restart of the server. So
> not sure, adding anything about that in the comment will make much
> sense.
I think it's dangerous
On Tue, Dec 12, 2017 at 9:00 AM, Amit Kapila wrote:
> On Mon, Dec 11, 2017 at 11:27 PM, Robert Haas wrote:
>> On Sun, Dec 10, 2017 at 11:07 PM, Amit Kapila
>> wrote:
>>> Okay, see the attached and let me know if that suffices the need?
>>
>> + * Check for unexpected worker death. T
On Mon, Dec 11, 2017 at 11:27 PM, Robert Haas wrote:
> On Sun, Dec 10, 2017 at 11:07 PM, Amit Kapila wrote:
>> Okay, see the attached and let me know if that suffices the need?
>
> + * Check for unexpected worker death. This will ensure that if
> + * the postmaster failed
On Sun, Dec 10, 2017 at 11:07 PM, Amit Kapila wrote:
> Okay, see the attached and let me know if that suffices the need?
+ * Check for unexpected worker death. This will ensure that if
+ * the postmaster failed to start the worker, then we don't wait
+ * for i
On Fri, Dec 8, 2017 at 9:45 PM, Robert Haas wrote:
> On Fri, Dec 8, 2017 at 4:56 AM, Amit Kapila wrote:
>> I think the optimization you are suggesting has a merit over what I
>> have done in the patch. However, one point to note is that we are
>> anyway going to call that function down in the pa
On Fri, Dec 8, 2017 at 4:56 AM, Amit Kapila wrote:
> I think the optimization you are suggesting has a merit over what I
> have done in the patch. However, one point to note is that we are
> anyway going to call that function down in the path(
> WaitForParallelWorkersToExit->WaitForBackgroundWork
On Wed, Dec 6, 2017 at 8:02 PM, Robert Haas wrote:
> On Wed, Dec 6, 2017 at 1:43 AM, Amit Kapila wrote:
>> Looks good to me.
>
> Committed and back-patched to 9.4 where dynamic background workers
> were introduced.
>
Thanks.
>>> Barring objections, I'll commit and
>>> back-patch this; then we c
On Wed, Dec 6, 2017 at 9:54 AM, Tom Lane wrote:
> Maybe track "worker is known to have launched" in the leader's state
> someplace?
That's probably one piece of it, yes.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas writes:
> I'm not in love with that part of the fix; the other parts of that if
> statement are just testing variables, and a function call that takes
> and releases an LWLock is a lot more expensive. Furthermore, that
> test will often be hit in practice, because we'll often arrive a
On Wed, Dec 6, 2017 at 1:43 AM, Amit Kapila wrote:
> Looks good to me.
Committed and back-patched to 9.4 where dynamic background workers
were introduced.
>> Barring objections, I'll commit and
>> back-patch this; then we can deal with the other part of this
>> afterward.
>
> Sure, I will rebase
On Tue, Dec 5, 2017 at 11:34 PM, Robert Haas wrote:
> On Wed, Nov 29, 2017 at 12:11 AM, Michael Paquier
> wrote:
>> On Fri, Oct 27, 2017 at 9:54 PM, Amit Kapila wrote:
>>> The patch still applies (with some hunks). I have added it in CF [1]
>>> to avoid losing track.
>>>
>>> [1] - https://commi
On Wed, Nov 29, 2017 at 12:11 AM, Michael Paquier
wrote:
> On Fri, Oct 27, 2017 at 9:54 PM, Amit Kapila wrote:
>> The patch still applies (with some hunks). I have added it in CF [1]
>> to avoid losing track.
>>
>> [1] - https://commitfest.postgresql.org/15/1341/
>
> This did not get reviews and
On Fri, Oct 27, 2017 at 9:54 PM, Amit Kapila wrote:
> The patch still applies (with some hunks). I have added it in CF [1]
> to avoid losing track.
>
> [1] - https://commitfest.postgresql.org/15/1341/
This did not get reviews and the patch still applies. I am moving it to next CF.
--
Michael
44 matches
Mail list logo