Hello,
While examining the reasons for excessive memory usage in prepared
statements I noticed that RTE_JOIN-kind RTEs contain a bunch of
columnNames and joinaliasvars, that are irrelevant after the Query after
has been rewritten. I have some queries that join about 20 tables and
select only a few values, mainly names of objects from those tables.
The attached patch adds a small cleanup function that iterates thought
the query and cleans stuff up. I may have missed some places that could
also be cleaned up but for now the memory requirements for my largest
statements have dropped from 31.2MB to 10.4MB with this patch.
After the statement has be executed seven times a generic plan is stored
in the statement, resulting in an extra 8,8MB memory usage, but still
this makes a difference of more than 50% total.
But the most interesting thing was that this patch reduced query
execution time by 50% (~110ms vs. 55ms) when no generic was created yet,
and by 35% (7.5ms vs. 5.1ms) when the global query plan had been created.
All tests still pass with my cleanup command, but I am afraid the tests
might not contain queries that still need that info after statement
preparation.
If anyone might have a look at it and hint me to a situation where this
might crash later on? Also, would it be possible for someone to run a
benchmark after applying this test to ensure my findings are not totally
off? I tested on a Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz with SSDs,
but everything should have been in memory when I ran the test.
Regards,
Daniel Migowski
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index b945b15..af8dbc9 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -51,6 +51,36 @@ static ParamListInfo EvaluateParams(PreparedStatement
*pstmt, List *params,
static Datum build_regtype_array(Oid *param_types, int num_params);
/*
+ * cleanUpQuery - Cleanup queries after they have been rewritten.
+ *
+ * This nulls all the joinaliasvars and colnames in RTEs of kind RTE_JOIN
+ * in CMD_SELECT queries after the Query has been analyzed and rewritten.
+ * For some join-heavy queries this results in a 70% memory usage reduction.
+ */
+static void cleanQuery(Query* q) {
+ ListCell* lc2;
+ if( q->commandType == CMD_SELECT ) {
+ foreach(lc2, q->rtable) {
+ RangeTblEntry* rte = (RangeTblEntry*)lfirst(lc2);
+ if( rte->rtekind == RTE_JOIN ) {
+ ListCell* lc3;
+ foreach(lc3, rte->joinaliasvars) {
+ lc3->data.ptr_value = NULL;
+ }
+ if( rte->eref ) {
+ foreach(lc3, rte->eref->colnames) {
+ lc3->data.ptr_value = NULL;
+ }
+ }
+ }
+ if( rte->rtekind == RTE_SUBQUERY ) {
+ cleanQuery(rte->subquery);
+ }
+ }
+ }
+}
+
+/*
* Implements the 'PREPARE' utility statement.
*/
void
@@ -64,6 +94,7 @@ PrepareQuery(PrepareStmt *stmt, const char *queryString,
Query *query;
List *query_list;
int i;
+ ListCell* lc;
/*
* Disallow empty-string statement name (conflicts with protocol-level
@@ -99,7 +130,6 @@ PrepareQuery(PrepareStmt *stmt, const char *queryString,
if (nargs)
{
ParseState *pstate;
- ListCell *l;
/*
* typenameTypeId wants a ParseState to carry the source query
string.
@@ -111,9 +141,9 @@ PrepareQuery(PrepareStmt *stmt, const char *queryString,
argtypes = (Oid *) palloc(nargs * sizeof(Oid));
i = 0;
- foreach(l, stmt->argtypes)
+ foreach(lc, stmt->argtypes)
{
- TypeName *tn = lfirst(l);
+ TypeName *tn = lfirst(lc);
Oid toid = typenameTypeId(pstate,
tn);
argtypes[i++] = toid;
@@ -163,6 +193,11 @@ PrepareQuery(PrepareStmt *stmt, const char *queryString,
/* Rewrite the query. The result could be 0, 1, or many queries. */
query_list = QueryRewrite(query);
+ /* Clearing joinaliasvars from join rte's now */
+ foreach(lc, query_list) {
+ cleanQuery((Query*)lfirst(lc));
+ }
+
/* Finish filling in the CachedPlanSource */
CompleteCachedPlan(plansource,
query_list,