Re: [HACKERS] Parallel Seq Scan

Thom Brown Fri, 02 Jan 2015 02:41:07 -0800

On 1 January 2015 at 10:34, Amit Kapila <amit.kapil...@gmail.com> wrote:


> > Running it again, I get the same issue.  This is with
> parallel_seqscan_degree set to 8, and the crash occurs with 4 and 2 too.
> >
> > This doesn't happen if I set the pgbench scale to 50.  I suspect this is
> a OOM issue.  My laptop has 16GB RAM, the table is around 13GB at scale
> 100, and I don't have swap enabled.  But I'm concerned it crashes the whole
> instance.
> >
>
> Isn't this a backend crash due to OOM?
> And after that server will restart automatically.
>

Yes, I'm fairly sure it is.  I guess what I'm confused about is that 8
parallel sequential scans in separate sessions (1 per session) don't cause
the server to crash, but in a single session (8 in 1 session), they do.


>
> > I also notice that requesting BUFFERS in a parallel EXPLAIN output
> yields no such information.
> > --
>
> Yeah and the reason for same is that all the work done related
> to BUFFERS is done by backend workers, master backend
> doesn't read any pages, so it is not able to accumulate this
> information.
>
> > Is that not possible to report?
>
> It is not impossible to report such information, we can develop some
> way to share such information between master backend and workers.
> I think we can do this if required once the patch is more stablized.
>

Ah great, as I think losing such information to this feature would be
unfortunate.

Will there be a GUC to influence parallel scan cost?  Or does it take into
account effective_io_concurrency in the costs?

And will the planner be able to decide whether or not it'll choose to use
background workers or not?  For example:

# explain (analyse, buffers, verbose) select distinct bid from
pgbench_accounts;
                                                              QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=89584.00..89584.05 rows=5 width=4) (actual
time=228.222..228.224 rows=5 loops=1)
   Output: bid
   Group Key: pgbench_accounts.bid
   Buffers: shared hit=83334
   ->  Seq Scan on public.pgbench_accounts  (cost=0.00..88334.00
rows=500000 width=4) (actual time=0.008..136.522 rows=500000 loops=1)
         Output: bid
         Buffers: shared hit=83334
 Planning time: 0.071 ms
 Execution time: 228.265 ms
(9 rows)

This is a quick plan, but if we tell it that it's allowed 8 background
workers:

# set parallel_seqscan_degree = 8;
SET
Time: 0.187 ms

# explain (analyse, buffers, verbose) select distinct bid from
pgbench_accounts;
                                                                   QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=12291.75..12291.80 rows=5 width=4) (actual
time=603.042..603.042 rows=1 loops=1)
   Output: bid
   Group Key: pgbench_accounts.bid
   ->  Parallel Seq Scan on public.pgbench_accounts  (cost=0.00..11041.75
rows=500000 width=4) (actual time=2.445..529.284 rows=500000 loops=1)
         Output: bid
         Number of Workers: 8
         Number of Blocks Per Workers: 10416
 Planning time: 0.049 ms
 Execution time: 663.103 ms
(9 rows)

Time: 663.437 ms

It's significantly slower.  I'd hope the planner would anticipate this and
decide, "I'm just gonna perform a single scan in this instance as it'll be
a lot quicker for this simple case."  So at the moment
parallel_seqscan_degree seems to mean "You *must* use this many threads if
you can parallelise."  Ideally we'd be saying "can use up to if necessary".

Thanks

Thom

Re: [HACKERS] Parallel Seq Scan

Reply via email to