On Sun, Jan 7, 2018 at 5:44 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> According to buildfarm member silverfish,
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=silverfish&dt=2018-01-06%2008%3A53%3A38
>
> it's possible to sometimes get this failure in the regression tests:
>
> *** 
> /mnt/buildfarm/buildroot/HEAD/pgsql.build/../pgsql/src/test/regress/expected/select_parallel.out
>     Tue Dec 19 20:24:02 2017
> --- 
> /mnt/buildfarm/buildroot/HEAD/pgsql.build/src/test/regress/results/select_parallel.out
>       Sat Jan  6 09:21:39 2018
> ***************
> *** 75,84 ****
>            Workers Planned: 3
>            ->  Partial Aggregate
>                  ->  Parallel Append
>                        ->  Seq Scan on d_star
>                        ->  Seq Scan on f_star
>                        ->  Seq Scan on e_star
> -                      ->  Seq Scan on b_star
>                        ->  Seq Scan on c_star
>                        ->  Seq Scan on a_star
>   (11 rows)
> --- 75,84 ----
>            Workers Planned: 3
>            ->  Partial Aggregate
>                  ->  Parallel Append
> +                      ->  Seq Scan on b_star
>                        ->  Seq Scan on d_star
>                        ->  Seq Scan on f_star
>                        ->  Seq Scan on e_star
>                        ->  Seq Scan on c_star
>                        ->  Seq Scan on a_star
>   (11 rows)
>
> Irreproducible failures in the regression tests are not very acceptable.
> Furthermore, considering that the query being tested is
>
> explain (costs off)
>   select round(avg(aa)), sum(aa) from a_star;
>
> it seems to me that the "expected" order of the sub-scans is mighty
> random to begin with.
>

I think order of sub-scans can be random if the number of rows in
child relations can vary across runs.  For the above case, the
subpaths (non-partial-paths) are always in the descending order of
their cost and I can see that by running it locally.  On my local m/c,
output is as below:

regression=# explain select round(avg(aa)), sum(aa) from a_star;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Finalize Aggregate  (cost=2.30..2.31 rows=1 width=40)
   ->  Gather  (cost=2.28..2.29 rows=3 width=40)
         Workers Planned: 3
         ->  Partial Aggregate  (cost=2.28..2.29 rows=1 width=40)
               ->  Parallel Append  (cost=0.00..2.20 rows=15 width=4)
                     ->  Seq Scan on d_star  (cost=0.00..1.16 rows=16 width=4)
                     ->  Seq Scan on f_star  (cost=0.00..1.16 rows=16 width=4)
                     ->  Seq Scan on e_star  (cost=0.00..1.07 rows=7 width=4)
                     ->  Seq Scan on b_star  (cost=0.00..1.04 rows=4 width=4)
                     ->  Seq Scan on c_star  (cost=0.00..1.04 rows=4 width=4)
                     ->  Seq Scan on a_star  (cost=0.00..1.03 rows=3 width=4)
(11 rows)

The above indicates that paths are listed in the order as expected.
What makes you think that the order of sub-scans can be random?  Is it
possible that the number of rows in child relations can vary across
runs?

One theory that can explain above failure is that the costs of
scanning some of the sub-paths is very close due to which sometimes
the results can vary.  If that is the case, then probably using
fuzz_factor in costs comparison (as is done in attached patch) can
improve the situation, may be we have to consider some other factors
like number of rows in each subpath.  However, it might be better to
first somehow reproduce this case and see what is going wrong, any
ideas?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment: fix_pa_cost_comp_v1.patch
Description: Binary data

Reply via email to