Re: master check fails on Windows Server 2008

Marina Polyakova Sat, 17 Feb 2018 10:44:07 -0800

On 16-02-2018 19:31, Tom Lane wrote:

Marina Polyakova <m.polyak...@postgrespro.ru> writes:

Hello, hackers! I got a permanent failure of master (commit
2a41507dab0f293ff241fe8ae326065998668af8) check on Windows Server2008.
Regression output and diffs as well as config.pl are attached.


Weird.  AFAICS the cost estimates for those two plans should be quite
different, so this isn't just a matter of the estimates maybe being
a bit platform-dependent.  (And that test has been there nearly a
year without causing reported problems.)

To dig into it a bit more, I tweaked the test case to show the costs
for both plans, and got an output diff as attached.  Could you try
the same experiment on your Windows box?  In order to force the choice
in the other direction, you'd need to temporarily disable enable_sort,
not enable_hashagg as I did here, but the principle is the same.

Thank you very much! Your test showed that hash aggregation was not evenadded to the possible paths (see windows_regression.diffs attached).Exploring this, I found that not allowing float8 to pass by value inconfig.pl was crucial for the size of the hash table used in this query(see diff.patch attached):


From postmaster.log on Windows:

2018-02-17 20:50:48.596 MSK [1592] pg_regress/stats_ext STATEMENT:EXPLAIN

         SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;

2018-02-17 20:50:48.596 MSK [1592] pg_regress/stats_ext LOG: rewrittenparse tree:

...

2018-02-17 20:50:48.596 MSK [1592] pg_regress/stats_ext STATEMENT:EXPLAIN

         SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
# 20 = INT8OID => pg_type.typbyval = FLOAT8PASSBYVAL:
get_agg_clause_costs_walker aggtranstype 20 get_typbyval(aggtranstype) 0

get_agg_clause_costs_walker avgwidth 8 sizeof(void *) 8costs->transitionSpace 24

# add AGG_SORTED path:
add_paths_to_grouping_rel 1 create_agg_path (aggstrategy 1)
estimate_hashagg_tablesize 1 hashentrysize 32
# add transitionSpace = 24:
estimate_hashagg_tablesize 2 hashentrysize 56
estimate_hashagg_tablesize 3 hashentrysize 96
estimate_hashagg_tablesize dNumGroups 1632.000000
# 156672 = 96 * 1632 > 131072:

add_paths_to_grouping_rel hashaggtablesize 156672 work_mem 128 work_mem* 1024L 131072 grouped_rel->pathlist == NIL 0

2018-02-17 20:50:48.596 MSK [1592] pg_regress/stats_ext LOG:  plan:
...

From postmaster.log on my computer (allow float8 to pass by value):

2018-02-17 20:45:57.651 MSK [3012] pg_regress/stats_ext STATEMENT:EXPLAIN

         SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;

2018-02-17 20:45:57.651 MSK [3012] pg_regress/stats_ext LOG: rewrittenparse tree:

...

2018-02-17 20:45:57.651 MSK [3012] pg_regress/stats_ext STATEMENT:EXPLAIN

         SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
# 20 = INT8OID => pg_type.typbyval = FLOAT8PASSBYVAL:
get_agg_clause_costs_walker aggtranstype 20 get_typbyval(aggtranstype) 1
# add AGG_SORTED path:
add_paths_to_grouping_rel 1 create_agg_path (aggstrategy 1)
estimate_hashagg_tablesize 1 hashentrysize 32
# add transitionSpace = 0:
estimate_hashagg_tablesize 2 hashentrysize 32
estimate_hashagg_tablesize 3 hashentrysize 72
estimate_hashagg_tablesize dNumGroups 1632.000000
# 117504 = 72 * 1632 < 131072:

add_paths_to_grouping_rel hashaggtablesize 117504 work_mem 128 work_mem* 1024L 131072 grouped_rel->pathlist == NIL 0

# add AGG_HASHED path:
add_paths_to_grouping_rel 2 create_agg_path (aggstrategy 2)
2018-02-17 20:45:57.651 MSK [3012] pg_regress/stats_ext LOG:  plan:
...

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 3e8cd14..d9788fd 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3583,11 +3583,24 @@ estimate_hashagg_tablesize(Path *path, const AggClauseCosts *agg_costs,
 	hashentrysize = MAXALIGN(path->pathtarget->width) +
 		MAXALIGN(SizeofMinimalTupleHeader);
 
+	fprintf(stderr,
+			"estimate_hashagg_tablesize 1 hashentrysize %ld\n",
+			hashentrysize);
+
 	/* plus space for pass-by-ref transition values... */
 	hashentrysize += agg_costs->transitionSpace;
+
+	fprintf(stderr,
+			"estimate_hashagg_tablesize 2 hashentrysize %ld\n",
+			hashentrysize);
+
 	/* plus the per-hash-entry overhead */
 	hashentrysize += hash_agg_entry_size(agg_costs->numAggs);
 
+	fprintf(stderr,
+			"estimate_hashagg_tablesize 3 hashentrysize %ld\nestimate_hashagg_tablesize dNumGroups %f\n",
+			hashentrysize, dNumGroups);
+
 	/*
 	 * Note that this disregards the effect of fill-factor and growth policy
 	 * of the hash-table. That's probably ok, given default the default
@@ -6043,6 +6056,9 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 					 * We have aggregation, possibly with plain GROUP BY. Make
 					 * an AggPath.
 					 */
+					fprintf(stderr,
+							"add_paths_to_grouping_rel 1 create_agg_path (aggstrategy %d)\n",
+							parse->groupClause ? AGG_SORTED : AGG_PLAIN);
 					add_path(grouped_rel, (Path *)
 							 create_agg_path(root,
 											 grouped_rel,
@@ -6213,6 +6229,11 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 														  agg_costs,
 														  dNumGroups);
 
+			fprintf(stderr,
+					"add_paths_to_grouping_rel hashaggtablesize %ld work_mem %d work_mem * 1024L %ld grouped_rel->pathlist == NIL %d\n",
+					hashaggtablesize, work_mem, work_mem * 1024L,
+					grouped_rel->pathlist == NIL);
+
 			/*
 			 * Provided that the estimated size of the hashtable does not
 			 * exceed work_mem, we'll generate a HashAgg Path, although if we
@@ -6226,6 +6247,9 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 				 * We just need an Agg over the cheapest-total input path,
 				 * since input order won't matter.
 				 */
+				fprintf(stderr,
+						"add_paths_to_grouping_rel 2 create_agg_path (aggstrategy %d)\n",
+						AGG_HASHED);
 				add_path(grouped_rel, (Path *)
 						 create_agg_path(root, grouped_rel,
 										 cheapest_path,
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 89f27ce..e45927e 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -637,6 +637,10 @@ get_agg_clause_costs_walker(Node *node, get_agg_clause_costs_context *context)
 			costs->finalCost += argcosts.per_tuple;
 		}
 
+		fprintf(stderr,
+				"get_agg_clause_costs_walker aggtranstype %d get_typbyval(aggtranstype) %d\n",
+				aggtranstype, get_typbyval(aggtranstype));
+
 		/*
 		 * If the transition type is pass-by-value then it doesn't add
 		 * anything to the required size of the hashtable.  If it is
@@ -683,6 +687,9 @@ get_agg_clause_costs_walker(Node *node, get_agg_clause_costs_context *context)
 
 			avgwidth = MAXALIGN(avgwidth);
 			costs->transitionSpace += avgwidth + 2 * sizeof(void *);
+			fprintf(stderr,
+					"get_agg_clause_costs_walker avgwidth %d sizeof(void *) %ld costs->transitionSpace %ld\n",
+					avgwidth, sizeof(void *), costs->transitionSpace);
 		}
 		else if (aggtranstype == INTERNALOID)
 		{
diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index deef08d..498bf4d 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -124,7 +124,7 @@ tablespace-setup:
 ## Run tests
 ##
 
-REGRESS_OPTS = --dlpath=. --max-concurrent-tests=20 $(EXTRA_REGRESS_OPTS)
+REGRESS_OPTS = --dlpath=. --max-concurrent-tests=20 --debug $(EXTRA_REGRESS_OPTS)
 
 check: all tablespace-setup
 	$(pg_regress_check) $(REGRESS_OPTS) --schedule=$(srcdir)/parallel_schedule $(MAXCONNOPT) $(EXTRA_TESTS)
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index ad9434f..ad44bf5 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -8,39 +8,39 @@
 # run tablespace by itself, and first, because it forces a checkpoint;
 # we'd prefer not to have checkpoints later in the tests because that
 # interferes with crash-recovery testing.
-test: tablespace
+# test: tablespace
 
 # ----------
 # The first group of parallel tests
 # ----------
-test: boolean char name varchar text int2 int4 int8 oid float4 float8 bit numeric txid uuid enum money rangetypes pg_lsn regproc
+# test: boolean char name varchar text int2 int4 int8 oid float4 float8 bit numeric txid uuid enum money rangetypes pg_lsn regproc
 
 # Depends on things setup during char, varchar and text
-test: strings
+# test: strings
 # Depends on int2, int4, int8, float4, float8
-test: numerology
+# test: numerology
 
 # ----------
 # The second group of parallel tests
 # ----------
-test: point lseg line box path polygon circle date time timetz timestamp timestamptz interval abstime reltime tinterval inet macaddr macaddr8 tstypes
+# test: point lseg line box path polygon circle date time timetz timestamp timestamptz interval abstime reltime tinterval inet macaddr macaddr8 tstypes
 
 # ----------
 # Another group of parallel tests
 # geometry depends on point, lseg, box, path, polygon and circle
 # horology depends on interval, timetz, timestamp, timestamptz, reltime and abstime
 # ----------
-test: geometry horology regex oidjoins type_sanity opr_sanity misc_sanity comments expressions
+# test: geometry horology regex oidjoins type_sanity opr_sanity misc_sanity comments expressions
 
 # ----------
 # These four each depend on the previous one
 # ----------
-test: insert
-test: insert_conflict
-test: create_function_1
-test: create_type
-test: create_table
-test: create_function_2
+# test: insert
+# test: insert_conflict
+# test: create_function_1
+# test: create_type
+# test: create_table
+# test: create_function_2
 
 # ----------
 # Load huge amounts of data
@@ -48,78 +48,79 @@ test: create_function_2
 # execute two copy tests parallel, to check that copy itself
 # is concurrent safe.
 # ----------
-test: copy copyselect copydml
+# test: copy copyselect copydml
 
 # ----------
 # More groups of parallel tests
 # ----------
-test: create_misc create_operator create_procedure
+# test: create_misc create_operator create_procedure
 # These depend on the above two
-test: create_index create_view
+# test: create_index create_view
 
 # ----------
 # Another group of parallel tests
 # ----------
-test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func
+# test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func
 
 # ----------
 # sanity_check does a vacuum, affecting the sort order of SELECT *
 # results. So it should not run parallel to other tests.
 # ----------
-test: sanity_check
+# test: sanity_check
 
 # ----------
 # Believe it or not, select creates a table, subsequent
 # tests need.
 # ----------
-test: errors
-test: select
-ignore: random
+# test: errors
+# test: select
+# ignore: random
 
 # ----------
 # Another group of parallel tests
 # ----------
-test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update namespace prepared_xacts delete
+# test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update namespace prepared_xacts delete
 
 # ----------
 # Another group of parallel tests
 # ----------
-test: brin gin gist spgist privileges init_privs security_label collate matview lock replica_identity rowsecurity object_address tablesample groupingsets drop_operator password
+# test: brin gin gist spgist privileges init_privs security_label collate matview lock replica_identity rowsecurity object_address tablesample groupingsets drop_operator password
 
 # ----------
 # Another group of parallel tests
 # ----------
-test: alter_generic alter_operator misc psql async dbsize misc_functions sysviews tsrf tidscan stats_ext
+# test: alter_generic alter_operator misc psql async dbsize misc_functions sysviews tsrf tidscan stats_ext
+test: stats_ext
 
 # rules cannot run concurrently with any test that creates a view
-test: rules psql_crosstab amutils
+# test: rules psql_crosstab amutils
 
 # run by itself so it can run parallel workers
-test: select_parallel
-test: write_parallel
+# test: select_parallel
+# test: write_parallel
 
 # no relation related tests can be put in this group
-test: publication subscription
+# test: publication subscription
 
 # ----------
 # Another group of parallel tests
 # ----------
-test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps advisory_lock json jsonb json_encoding indirect_toast equivclass
+# test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps advisory_lock json jsonb json_encoding indirect_toast equivclass
 
 # ----------
 # Another group of parallel tests
 # NB: temp.sql does a reconnect which transiently uses 2 connections,
 # so keep this parallel group to at most 19 tests
 # ----------
-test: plancache limit plpgsql copy2 temp domain rangefuncs prepare without_oid conversion truncate alter_table sequence polymorphism rowtypes returning largeobject with xml
+# test: plancache limit plpgsql copy2 temp domain rangefuncs prepare without_oid conversion truncate alter_table sequence polymorphism rowtypes returning largeobject with xml
 
 # ----------
 # Another group of parallel tests
 # ----------
-test: identity partition_join partition_prune reloptions hash_part indexing
+# test: identity partition_join partition_prune reloptions hash_part indexing
 
 # event triggers cannot run concurrently with any test that runs DDL
-test: event_trigger
+# test: event_trigger
 
 # run stats by itself because its delay may be insufficient under heavy load
-test: stats
+# test: stats
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index 46acaad..e9642da 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -177,8 +177,13 @@ EXPLAIN (COSTS off)
 EXPLAIN (COSTS off)
  SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c, d;
 
-EXPLAIN (COSTS off)
+EXPLAIN --(COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
+
+set enable_sort = 0;
+EXPLAIN --(COSTS off)
  SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
+reset enable_sort;
 
 EXPLAIN (COSTS off)
  SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl
index 314f2c3..b8548bc 100644
--- a/src/tools/msvc/vcregress.pl
+++ b/src/tools/msvc/vcregress.pl
@@ -126,7 +126,8 @@ sub check
 		"--max-concurrent-tests=20",
 		"--encoding=SQL_ASCII",
 		"--no-locale",
-		"--temp-instance=./tmp_check");
+		"--temp-instance=./tmp_check",
+		"--debug");
 	push(@args, $maxconn)     if $maxconn;
 	push(@args, $temp_config) if $temp_config;
 	system(@args);

*** C:/Users/buildfarm/mpolyakova/postgrespro_master/src/test/regress/expected/stats_ext.out	Fri Feb 16 12:56:00 2018
--- C:/Users/buildfarm/mpolyakova/postgrespro_master/src/test/regress/results/stats_ext.out	Sat Feb 17 20:50:48 2018
***************
*** 309,323 ****
           ->  Seq Scan on ndistinct
  (5 rows)
  
! EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
           QUERY PLAN          
! -----------------------------
!  HashAggregate
     Group Key: b, c, d
!    ->  Seq Scan on ndistinct
! (3 rows)
  
  EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
           QUERY PLAN          
--- 309,338 ----
           ->  Seq Scan on ndistinct
  (5 rows)
  
! EXPLAIN --(COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
                                   QUERY PLAN                                 
! ----------------------------------------------------------------------------
!  GroupAggregate  (cost=1026.89..1168.21 rows=1632 width=20)
     Group Key: b, c, d
!    ->  Sort  (cost=1026.89..1051.89 rows=10000 width=12)
!          Sort Key: b, c, d
!          ->  Seq Scan on ndistinct  (cost=0.00..191.00 rows=10000 width=12)
! (5 rows)
! 
! set enable_sort = 0;
! EXPLAIN --(COSTS off)
!  SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
!                                  QUERY PLAN                                 
! ----------------------------------------------------------------------------
!  GroupAggregate  (cost=10000001026.89..10000001168.21 rows=1632 width=20)
!    Group Key: b, c, d
!    ->  Sort  (cost=10000001026.89..10000001051.89 rows=10000 width=12)
!          Sort Key: b, c, d
!          ->  Seq Scan on ndistinct  (cost=0.00..191.00 rows=10000 width=12)
! (5 rows)
  
+ reset enable_sort;
  EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
           QUERY PLAN          

======================================================================

Re: master check fails on Windows Server 2008

Reply via email to