"actual time" in QUERY PLAN for parallel operation when loops is bigger than 1

2021-11-16 Thread PG Doc comments form
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/14/using-explain.html
Description:

https://www.postgresql.org/docs/14/using-explain.html
The doc says
"the loops value reports the total number of executions of the node, and the
actual time and rows values shown are averages per-execution. This is done
to make the numbers comparable with the way that the cost estimates are
shown."

But I found for parallel operation, the above description maybe not
correct.

For example 
postgres=# create table c(id int);
CREATE TABLE
postgres=# insert into c select generate_series(1,100);
INSERT 0 100
postgres=# explain analyze select count(*) from c;
 QUERY PLAN
-
 Finalize Aggregate  (cost=10633.55..10633.56 rows=1 width=8) (actual
time=290.460..290.508 rows=1 loops=1)
   ->  Gather  (cost=10633.33..10633.54 rows=2 width=8) (actual
time=289.605..290.484 rows=3 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 ->  Partial Aggregate  (cost=9633.33..9633.34 rows=1 width=8)
(actual time=188.336..188.337 rows=1 loops=3)
   ->  Parallel Seq Scan on c  (cost=0.00..8591.67 rows=416667
width=0) (actual time=0.030..140.036 rows=33 loops=3)
 Planning Time: 0.331 ms
 Execution Time: 290.607 ms
(8 rows)

postgres=# 

According to PG-doc, the "Parallel Seq Scan" node cost 140.036*3=420ms, but
the total cost for this SQL is only 290ms.
Is the output of this explain correct?


Re: "actual time" in QUERY PLAN for parallel operation when loops is bigger than 1

2021-11-16 Thread nikolai.berkoff
> https://www.postgresql.org/docs/14/using-explain.html
> The doc says
> 

> "the loops value reports the total number of executions of the node, and the
> 

> actual time and rows values shown are averages per-execution. This is done
> 

> to make the numbers comparable with the way that the cost estimates are
> 

> shown."
> 

> But I found for parallel operation, the above description maybe not
> 

> correct.
> 

> For example
> 

> postgres=# create table c(id int);
> 

> CREATE TABLE
> 

> postgres=# insert into c select generate_series(1,100);
> 

> INSERT 0 100
> 

> postgres=# explain analyze select count(*) from c;
> 

> QUERY PLAN
> --
> 

> Finalize Aggregate (cost=10633.55..10633.56 rows=1 width=8) (actual
> 

> time=290.460..290.508 rows=1 loops=1)
> 

> -> Gather (cost=10633.33..10633.54 rows=2 width=8) (actual
> 

> time=289.605..290.484 rows=3 loops=1)
> 

> Workers Planned: 2
> 

> Workers Launched: 2
> 

> -> Partial Aggregate (cost=9633.33..9633.34 rows=1 width=8)
> 

> (actual time=188.336..188.337 rows=1 loops=3)
> 

> -> Parallel Seq Scan on c (cost=0.00..8591.67 rows=416667
> 

> width=0) (actual time=0.030..140.036 rows=33 loops=3)
> 

> Planning Time: 0.331 ms
> 

> Execution Time: 290.607 ms
> 

> (8 rows)
> 

> postgres=#
> 

> According to PG-doc, the "Parallel Seq Scan" node cost 140.036*3=420ms, but
> 

> the total cost for this SQL is only 290ms.
> 

> Is the output of this explain correct?

Parallel query is explained in 
https://www.postgresql.org/docs/14/how-parallel-query-works.html and 
https://www.postgresql.org/docs/14/parallel-plans.html

The docs seem clear to me that as the nodes are executed in parallel then the 
time execution time is not 140.036*3. The 140.036 value is actual time the 
Parallel Seq Scan nodes ran for but there were up to 2 running in parallel.

publickey - nikolai.berkoff@pm.me - 0xD189792D.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


RE: "actual time" in QUERY PLAN for parallel operation when loops is bigger than 1

2021-11-16 Thread tanghy.f...@fujitsu.com
Wednesday, November 17, 2021 6:29 AM, nikolai.berkoff  
wrote:
>
> Parallel query is explained in
> https://www.postgresql.org/docs/14/how-parallel-query-works.html and
> https://www.postgresql.org/docs/14/parallel-plans.html
> 
> The docs seem clear to me that as the nodes are executed in parallel then the
> time execution time is not 140.036*3. The 140.036 value is actual time the 
> Parallel
> Seq Scan nodes ran for but there were up to 2 running in parallel.

Thanks for your reply.
I read your references but still confused about the 'loops' in parallel query 
result.

> ->  Parallel Seq Scan on c  (cost=0.00..8591.67 rows=416667 width=0) (actual 
> time=0.030..140.036 rows=33 loops=3)
In my previous example, actual row number is 33*3=1e6(which is correct), so 
I think the actual time is 140.036*3ms.
Do your think the loops(3) has no meaning for parallel scan node when calculate 
actual time?

Regards,
Tang

> -Original Message-
> From: nikolai.berkoff 
> Sent: Wednesday, November 17, 2021 6:29 AM
> To: Tang, Haiying/唐 海英 ;
> pgsql-docs@lists.postgresql.org
> Subject: Re: "actual time" in QUERY PLAN for parallel operation when loops is
> bigger than 1
> 
> > https://www.postgresql.org/docs/14/using-explain.html
> > The doc says
> >
> 
> > "the loops value reports the total number of executions of the node, and the
> >
> 
> > actual time and rows values shown are averages per-execution. This is done
> >
> 
> > to make the numbers comparable with the way that the cost estimates are
> >
> 
> > shown."
> >
> 
> > But I found for parallel operation, the above description maybe not
> >
> 
> > correct.
> >
> 
> > For example
> >
> 
> > postgres=# create table c(id int);
> >
> 
> > CREATE TABLE
> >
> 
> > postgres=# insert into c select generate_series(1,100);
> >
> 
> > INSERT 0 100
> >
> 
> > postgres=# explain analyze select count(*) from c;
> >
> 
> > QUERY PLAN
> >
> ---
> ---
> ---
> ---
> ---
> ---
> ---
> ---
> ---
> ---
> >
> 
> > Finalize Aggregate (cost=10633.55..10633.56 rows=1 width=8) (actual
> >
> 
> > time=290.460..290.508 rows=1 loops=1)
> >
> 
> > -> Gather (cost=10633.33..10633.54 rows=2 width=8) (actual
> >
> 
> > time=289.605..290.484 rows=3 loops=1)
> >
> 
> > Workers Planned: 2
> >
> 
> > Workers Launched: 2
> >
> 
> > -> Partial Aggregate (cost=9633.33..9633.34 rows=1 width=8)
> >
> 
> > (actual time=188.336..188.337 rows=1 loops=3)
> >
> 
> > -> Parallel Seq Scan on c (cost=0.00..8591.67 rows=416667
> >
> 
> > width=0) (actual time=0.030..140.036 rows=33 loops=3)
> >
> 
> > Planning Time: 0.331 ms
> >
> 
> > Execution Time: 290.607 ms
> >
> 
> > (8 rows)
> >
> 
> > postgres=#
> >
> 
> > According to PG-doc, the "Parallel Seq Scan" node cost 140.036*3=420ms, but
> >
> 
> > the total cost for this SQL is only 290ms.
> >
> 
> > Is the output of this explain correct?
> 
> Parallel query is explained in
> https://www.postgresql.org/docs/14/how-parallel-query-works.html and
> https://www.postgresql.org/docs/14/parallel-plans.html
> 
> The docs seem clear to me that as the nodes are executed in parallel then the
> time execution time is not 140.036*3. The 140.036 value is actual time the 
> Parallel
> Seq Scan nodes ran for but there were up to 2 running in parallel.


Re: "actual time" in QUERY PLAN for parallel operation when loops is bigger than 1

2021-11-16 Thread Guillaume Lelarge
Hi,

Le mer. 17 nov. 2021 à 06:32, tanghy.f...@fujitsu.com <
tanghy.f...@fujitsu.com> a écrit :

> Wednesday, November 17, 2021 6:29 AM, nikolai.berkoff <
> nikolai.berk...@pm.me> wrote:
> >
> > Parallel query is explained in
> > https://www.postgresql.org/docs/14/how-parallel-query-works.html and
> > https://www.postgresql.org/docs/14/parallel-plans.html
> >
> > The docs seem clear to me that as the nodes are executed in parallel
> then the
> > time execution time is not 140.036*3. The 140.036 value is actual time
> the Parallel
> > Seq Scan nodes ran for but there were up to 2 running in parallel.
>
> Thanks for your reply.
> I read your references but still confused about the 'loops' in parallel
> query result.
>
> > ->  Parallel Seq Scan on c  (cost=0.00..8591.67 rows=416667 width=0)
> (actual time=0.030..140.036 rows=33 loops=3)
> In my previous example, actual row number is 33*3=1e6(which is
> correct), so I think the actual time is 140.036*3ms.
> Do your think the loops(3) has no meaning for parallel scan node when
> calculate actual time?
>
>
As far as I understand it, you have to multiply the number of rows by the
number of loops, but this doesn't apply to duration at least for parallel
queries.


-- 
Guillaume.