Re: [HACKERS] parallel mode and parallel contexts

Robert Haas Mon, 05 Jan 2015 07:24:21 -0800

On Fri, Jan 2, 2015 at 9:04 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> While working on parallel seq-scan patch to adapt this framework, I
> noticed few things and have questions regrading the same.
>
> 1.
> Currently parallel worker just attaches to error queue, for tuple queue
> do you expect it to be done in the same place or in the caller supplied
> function, if later then we need segment address as input to that function
> to attach queue to the segment(shm_mq_attach()).
> Another question, I have in this regard is that if we have redirected
> messages to error queue by using pq_redirect_to_shm_mq, then how can
> we set tuple queue for the same purpose.  Similarly I think more handling
> is needed for tuple queue in master backend and the answer to above will
> dictate what is the best way to do it.


I've come to the conclusion that it's a bad idea for tuples to be sent
through the same queue as errors.  We want errors (or notices, but
especially errors) to be processed promptly, but there may be a
considerable delay in processing tuples.  For example, imagine a plan
that looks like this:

Nested Loop
-> Parallel Seq Scan on p
-> Index Scan on q
    Index Scan: q.x = p.x

The parallel workers should fill up the tuple queues used by the
parallel seq scan so that the master doesn't have to do any of that
work itself.  Therefore, the normal situation will be that those tuple
queues are all full.  If an error occurs in a worker at that point, it
can't add it to the tuple queue, because the tuple queue is full.  But
even if it could do that, the master then won't notice the error until
it reads all of the queued-up tuple messges that are in the queue in
front of the error, and maybe some messages from the other queues as
well, since it probably round-robins between the queues or something
like that.  Basically, it could do a lot of extra work before noticing
that error in there.

Now we could avoid that by having the master read messages from the
queue immediately and just save them off to local storage if they
aren't error messages.  But that's not very desirable either, because
now we have no flow-control.  The workers will just keep spamming
tuples that the master isn't ready for into the queues, and the master
will keep reading them and saving them to local storage, and
eventually it will run out of memory and die.

We could engineer some solution to this problem, of course, but it
seems quite a bit simpler to just have two queues.  The error queues
don't need to be very big (I made them 16kB, which is trivial on any
system on which you care about having working parallelism) and the
tuple queues can be sized as needed to avoid pipeline stalls.

> 2.
> Currently there is no interface for wait_for_workers_to_become_ready()
> in your patch, don't you think it is important that before start of fetching
> tuples, we should make sure all workers are started, what if some worker
> fails to start?

I think that, in general, getting the most benefit out of parallelism
means *avoiding* situations where backends have to wait for each
other.  If the relation being scanned is not too large, the user
backend might be able to finish the whole scan - or a significant
fraction of it - before the workers initialize.  Of course in that
case it might have been a bad idea to parallelize in the first place,
but we should still try to make the best of the situation.  If some
worker fails to start, then instead of having the full degree N
parallelism we were hoping for, we have some degree K < N, so things
will take a little longer, but everything should still work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel mode and parallel contexts

Reply via email to