More deficiencies in outfuncs/readfuncs processing

Tom Lane Sun, 16 Sep 2018 16:04:29 -0700

The report in

https://www.postgresql.org/message-id/flat/3690074f-abd2-56a9-144a-aa5545d7a291%40postgrespro.ru


set off substantial alarm bells for me about whether outfuncs/readfuncs
processing had any additional problems we'd failed to notice.  I thought
that to investigate that, it'd be a good idea to invent a debugging option
comparable to COPY_PARSE_PLAN_TREES, except it passes all parse and plan
trees through nodeToString + stringToNode rather than copyObject.  I did
so, and darn if a number of creepy-crawlies didn't get shaken out of the
woodwork immediately.  The first attached patch adds that debugging option
(plus some hackery I'll explain in a moment).  I propose that some form of
this should go into HEAD and we should configure at least one buildfarm
animal to enable it.  The second attached patch fixes the bugs I found;
parts of it need to get back-patched.

This debugging option also exposes the XMLNAMESPACES issue shown in the
aforementioned thread.  So the patch shown there needs to go in first,
or you get a core dump in xml-enabled builds.  But the sum total of all
three patches does pass make check-world.

The hackery in patch 0001 has to do with copying queryId, stmt_location,
and stmt_len fields forward across nodeToString + stringToNode.  The
core regression tests pass fine without that, but pg_stat_statements
falls over, because it needs that data to survive parse analysis as
well as planning.

As far as queryId goes, I'm satisfied with the hack shown here of just
having pg_rewrite_query propagate the field forward across the test.
There's a case to be made for fixing it by storing queryId in stored rules
instead, but that would require a far higher commitment to stability and
universality of the queryId computation than we've made so far.  Such a
change seems like something to consider separately if at all.  (Note that
no hack is needed for plan trees, because _outPlannedStmt/_readPlannedStmt
already transmit the queryId for a plan.)

The location fields are a much more ticklish matter, because the behavior
for those ties into a bunch of historical and possible future behaviors.
Currently, although outfuncs.c prints node location fields just fine,
readfuncs.c ignores the stored values and inserts -1.  The reason for
that is to be able to distinguish nodes that actually came from the
current query (for which the locations actually mean something with
respect to a query text we have at hand) from nodes that came from some
stored view or rule (for which we lack corresponding query text).  So
readfuncs.c marks nodes of the latter type as "unknown location" and all
is well.  As this patch stands, when the debug option is on, all nodes
coming out of the parser will have unknown locations, except for the
top-level Query or PlannedStmt node that we specially hacked.  That's not
great; a debugging option shouldn't risk behavioral changes like that.
Right now, we pass regression tests anyway because pretty much nothing
after parse analysis looks at the location fields, but there have been
discussions about changing that so as to get better execution-time
error reporting.

I thought for a bit about replacing those hacks with something like
this in readfuncs.c:

#ifdef WRITE_READ_PARSE_PLAN_TREES
#define READ_LOCATION_FIELD(fldname) READ_INT_FIELD(fldname)
#else
// existing definition of READ_LOCATION_FIELD
#endif

but that breaks the property of clearing the location info coming
in from stored rules, so it's unlikely to be satisfactory.  Another
idea is to have a runtime switch in readfuncs.c, allowing stringToNode
to treat location fields as it does now while there'd be an additional
entry point that would read location fields like plain ints.  Then
we'd use the latter for the test code.  This is a bit ugly but it might
be the best solution.

Patch 0002 addresses several more-or-less-independent issues that are
exposed by running the regression tests with patch 0001 activated:

* Query.withCheckOptions fails to propagate through write+read, because
it's intentionally ignored by outfuncs/readfuncs on the grounds that
it'd always be NIL anyway in stored rules.  This seems like premature
optimization of the worst sort, since it's not even saving very much:
if the assumption holds, what's suppressed from a stored Query is only
" :withCheckOptions <>", hardly a big savings.  And of course if the
assumption ever fails to hold, it's just broken, and broken in a non
obvious way too (you'd only notice if you expected a check option
failure and didn't get one).  So this is pretty dumb and I think we
ought to fix it by treating that field normally in outfuncs/readfuncs.
That'd require a catversion bump, but we only need to do it in HEAD.
The only plausible alternative is to change _equalQuery to ignore
withCheckOptions, which does not sound like a good idea at all.

* The system expects TABLEFUNC RTEs to have coltypes, coltypmods, and
colcollations lists, but outfuncs doesn't dump them and readfuncs
doesn't restore them.  This doesn't cause obvious failures, because
the only things that look at those fields are expandRTE() and
get_rte_attribute_type(), which are mostly used during parse analysis.
But expandRTE() is used in the rewriter and planner, so probably it's
possible to cause a crash or misbehavior with a stored view that returns
the result of an XMLTABLE FROM-item.  I've not tried to build a test case,
but I'm pretty sure we need to back-patch a fix for this as far back as
we have XMLTABLE.  Fortunately, because those fields are redundant with
the TableFunc node, we can just link to the lists we read in the TableFunc
node rather than force a catversion bump to store them separately.  (This
is pretty grotty, but since it's replicating the physical duplication
established by addRangeTableEntryForTableFunc, I think it's OK.)

* readfuncs.c omits read support for NamedTuplestoreScan plan nodes.
This accidentally fails to break parallel query because a query using
a named tuplestore would never be considered parallel-safe anyway,
so we don't have to ship it to workers.  Still, I'm pretty certain
this is somebody's sloppy oversight not an intentional omission,
and propose that we should back-patch that addition as far back as
NamedTuplestoreScan exists (ie v10).  There seem to be no other cases
where we've omitted plan node read support, so this one has little
excuse to live.

* Several places that convert a view RTE into a subquery RTE, or similar
manipulations, failed to clear out fields that were specific to the
original RTE type and should be zero in a subquery RTE.  This has no real
impact on the system, but since readfuncs.c will leave such fields as
zero, equalfuncs.c thinks the nodes are different leading to a reported
failure.  I'm satisfied with patching this in HEAD.  (You could
alternatively argue that we should change _equalRangeTblEntry to ignore
fields that are not supposed to be set based on the rtekind, but I don't
think that's a good idea.)

* BuildOnConflictExcludedTargetlist randomly set the resname of some
TargetEntries to "" not NULL.  outfuncs/readfuncs don't distinguish those
cases, and so the string will read back in as NULL ... but equalfuncs.c
does distinguish.  Perhaps we ought to try to make things more consistent
in this area --- but it's just useless extra code space for
BuildOnConflictExcludedTargetlist to not use NULL here, so I fixed it for
now by making it do so.  Again, changing this in HEAD seems sufficient.

Comments?

                        regards, tom lane

diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index b309395..b036525 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -287,6 +287,13 @@
 /* #define COPY_PARSE_PLAN_TREES */
 
 /*
+ * Define this to force all parse and plan trees to be passed through
+ * outfuncs.c/readfuncs.c, to facilitate catching errors and omissions in
+ * those modules.
+ */
+/* #define WRITE_READ_PARSE_PLAN_TREES */
+
+/*
  * Define this to force all raw parse trees for DML statements to be scanned
  * by raw_expression_tree_walker(), to facilitate catching errors and
  * omissions in that function.
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7a9ada2..180634f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -633,6 +633,12 @@ pg_parse_query(const char *query_string)
 	}
 #endif
 
+	/*
+	 * Currently, outfuncs/readfuncs support is missing for many raw parse
+	 * tree nodes, so we don't try to implement WRITE_READ_PARSE_PLAN_TREES
+	 * here.
+	 */
+
 	TRACE_POSTGRESQL_QUERY_PARSE_DONE(query_string);
 
 	return raw_parsetree_list;
@@ -776,6 +782,48 @@ pg_rewrite_query(Query *query)
 	}
 #endif
 
+#ifdef WRITE_READ_PARSE_PLAN_TREES
+	/* Optional debugging check: pass querytree through outfuncs/readfuncs */
+	{
+		List	   *new_list = NIL;
+		ListCell   *lc;
+
+		/*
+		 * We currently lack outfuncs/readfuncs support for most utility
+		 * statement types, so only attempt to write/read non-utility queries.
+		 */
+		foreach(lc, querytree_list)
+		{
+			Query	   *query = castNode(Query, lfirst(lc));
+
+			if (query->commandType != CMD_UTILITY)
+			{
+				char	   *str = nodeToString(query);
+				Query	   *new_query = stringToNode(str);
+
+				/*
+				 * These fields are not saved in stored rules, but we must
+				 * preserve them here to avoid breaking pg_stat_statements.
+				 */
+				new_query->queryId = query->queryId;
+				new_query->stmt_location = query->stmt_location;
+				new_query->stmt_len = query->stmt_len;
+
+				new_list = lappend(new_list, new_query);
+				pfree(str);
+			}
+			else
+				new_list = lappend(new_list, query);
+		}
+
+		/* This checks both outfuncs/readfuncs and the equal() routines... */
+		if (!equal(new_list, querytree_list))
+			elog(WARNING, "outfuncs/readfuncs failed to produce equal parse tree");
+		else
+			querytree_list = new_list;
+	}
+#endif
+
 	if (Debug_print_rewritten)
 		elog_node_display(LOG, "rewritten parse tree", querytree_list,
 						  Debug_pretty_print);
@@ -830,6 +878,30 @@ pg_plan_query(Query *querytree, int cursorOptions, ParamListInfo boundParams)
 	}
 #endif
 
+#ifdef WRITE_READ_PARSE_PLAN_TREES
+	/* Optional debugging check: pass plan tree through outfuncs/readfuncs */
+	{
+		char	   *str;
+		PlannedStmt *new_plan;
+
+		str = nodeToString(plan);
+		new_plan = stringToNode(str);
+		pfree(str);
+
+		/*
+		 * equal() currently does not have routines to compare Plan nodes, so
+		 * don't try to test equality here.  Perhaps fix someday?
+		 */
+#ifdef NOT_USED
+		/* This checks both outfuncs/readfuncs and the equal() routines... */
+		if (!equal(new_plan, plan))
+			elog(WARNING, "outfuncs/readfuncs failed to produce an equal plan tree");
+		else
+#endif
+			plan = new_plan;
+	}
+#endif
+
 	/*
 	 * Print plan if debugging.
 	 */
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 69fd5b2..93f1e2c 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1492,8 +1499,9 @@ _readPlannedStmt(void)
 	READ_NODE_FIELD(invalItems);
 	READ_NODE_FIELD(paramExecTypes);
 	READ_NODE_FIELD(utilityStmt);
-	READ_LOCATION_FIELD(stmt_location);
-	READ_LOCATION_FIELD(stmt_len);
+	/* We preserve these fields for the convenience of pg_stat_statements */
+	READ_INT_FIELD(stmt_location);
+	READ_INT_FIELD(stmt_len);
 
 	READ_DONE();
 }

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 69fd5b2..93f1e2c 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -3003,7 +3003,7 @@ _outQuery(StringInfo str, const Query *node)
 	WRITE_NODE_FIELD(rowMarks);
 	WRITE_NODE_FIELD(setOperations);
 	WRITE_NODE_FIELD(constraintDeps);
-	/* withCheckOptions intentionally omitted, see comment in parsenodes.h */
+	WRITE_NODE_FIELD(withCheckOptions);
 	WRITE_LOCATION_FIELD(stmt_location);
 	WRITE_LOCATION_FIELD(stmt_len);
 }
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 3254524..b2db5a7 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -269,7 +269,7 @@ _readQuery(void)
 	READ_NODE_FIELD(rowMarks);
 	READ_NODE_FIELD(setOperations);
 	READ_NODE_FIELD(constraintDeps);
-	/* withCheckOptions intentionally omitted, see comment in parsenodes.h */
+	READ_NODE_FIELD(withCheckOptions);
 	READ_LOCATION_FIELD(stmt_location);
 	READ_LOCATION_FIELD(stmt_len);
 
@@ -1366,6 +1366,13 @@ _readRangeTblEntry(void)
 			break;
 		case RTE_TABLEFUNC:
 			READ_NODE_FIELD(tablefunc);
+			/* The RTE must have a copy of the column type info, if any */
+			if (local_node->tablefunc)
+			{
+				local_node->coltypes = local_node->tablefunc->coltypes;
+				local_node->coltypmods = local_node->tablefunc->coltypmods;
+				local_node->colcollations = local_node->tablefunc->colcollations;
+			}
 			break;
 		case RTE_VALUES:
 			READ_NODE_FIELD(values_lists);
@@ -1910,6 +1918,21 @@ _readCteScan(void)
 }
 
 /*
+ * _readNamedTuplestoreScan
+ */
+static NamedTuplestoreScan *
+_readNamedTuplestoreScan(void)
+{
+	READ_LOCALS(NamedTuplestoreScan);
+
+	ReadCommonScan(&local_node->scan);
+
+	READ_STRING_FIELD(enrname);
+
+	READ_DONE();
+}
+
+/*
  * _readWorkTableScan
  */
 static WorkTableScan *
@@ -2693,6 +2716,8 @@ parseNodeString(void)
 		return_value = _readTableFuncScan();
 	else if (MATCH("CTESCAN", 7))
 		return_value = _readCteScan();
+	else if (MATCH("NAMEDTUPLESTORESCAN", 19))
+		return_value = _readNamedTuplestoreScan();
 	else if (MATCH("WORKTABLESCAN", 13))
 		return_value = _readWorkTableScan();
 	else if (MATCH("FOREIGNSCAN", 11))
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index c3f46a2..688b3a1 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -586,10 +586,13 @@ inline_set_returning_functions(PlannerInfo *root)
 			funcquery = inline_set_returning_function(root, rte);
 			if (funcquery)
 			{
-				/* Successful expansion, replace the rtable entry */
+				/* Successful expansion, convert the RTE to a subquery */
 				rte->rtekind = RTE_SUBQUERY;
 				rte->subquery = funcquery;
+				rte->security_barrier = false;
+				/* Clear fields that should not be set in a subquery RTE */
 				rte->functions = NIL;
+				rte->funcordinality = false;
 			}
 		}
 	}
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index c601b6d..c020600 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -1116,7 +1116,7 @@ BuildOnConflictExcludedTargetlist(Relation targetrel,
 			 * the Const claims to be.
 			 */
 			var = (Var *) makeNullConst(INT4OID, -1, InvalidOid);
-			name = "";
+			name = NULL;
 		}
 		else
 		{
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index d830569..327e5c3 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -1582,15 +1582,18 @@ ApplyRetrieveRule(Query *parsetree,
 	rule_action = fireRIRrules(rule_action, activeRIRs);
 
 	/*
-	 * Now, plug the view query in as a subselect, replacing the relation's
-	 * original RTE.
+	 * Now, plug the view query in as a subselect, converting the relation's
+	 * original RTE to a subquery RTE.
 	 */
 	rte = rt_fetch(rt_index, parsetree->rtable);
 
 	rte->rtekind = RTE_SUBQUERY;
-	rte->relid = InvalidOid;
-	rte->security_barrier = RelationIsSecurityView(relation);
 	rte->subquery = rule_action;
+	rte->security_barrier = RelationIsSecurityView(relation);
+	/* Clear fields that should not be set in a subquery RTE */
+	rte->relid = InvalidOid;
+	rte->relkind = 0;
+	rte->tablesample = NULL;
 	rte->inh = false;			/* must not be set for a subquery */
 
 	/*
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 07ab1a3..62209a8 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -168,9 +168,8 @@ typedef struct Query
 	List	   *constraintDeps; /* a list of pg_constraint OIDs that the query
 								 * depends on to be semantically valid */
 
-	List	   *withCheckOptions;	/* a list of WithCheckOption's, which are
-									 * only added during rewrite and therefore
-									 * are not written out as part of Query. */
+	List	   *withCheckOptions;	/* a list of WithCheckOption's (added
+									 * during rewrite) */
 
 	/*
 	 * The following two fields identify the portion of the source text string
@@ -1034,7 +1033,7 @@ typedef struct RangeTblEntry
 	bool		self_reference; /* is this a recursive self-reference? */
 
 	/*
-	 * Fields valid for table functions, values, CTE and ENR RTEs (else NIL):
+	 * Fields valid for CTE, VALUES, ENR, and TableFunc RTEs (else NIL):
 	 *
 	 * We need these for CTE RTEs so that the types of self-referential
 	 * columns are well-defined.  For VALUES RTEs, storing these explicitly
@@ -1042,7 +1041,9 @@ typedef struct RangeTblEntry
 	 * ENRs, we store the types explicitly here (we could get the information
 	 * from the catalogs if 'relid' was supplied, but we'd still need these
 	 * for TupleDesc-based ENRs, so we might as well always store the type
-	 * info here).
+	 * info here).  For TableFuncs, these fields are redundant with data in
+	 * the TableFunc node, but keeping them here allows some code sharing with
+	 * the other cases.
 	 *
 	 * For ENRs only, we have to consider the possibility of dropped columns.
 	 * A dropped column is included in these lists, but it will have zeroes in

More deficiencies in outfuncs/readfuncs processing

Reply via email to