Alvaro Herrera <alvhe...@2ndquadrant.com> writes: > On this patch, beyond the fact that it's causing a crash in the > regression tests as evidenced by the CFbot, we seem to be waiting on the > input of the larger community on whether it's a desired feature or not. > We have Kyotaro's vote for it, but it would be good to get more.
I'm pretty dubious about this (just as I am with the somewhat-related topic of syscache/relcache size limits). It's hard to tell where performance is going to fall off a cliff if you limit the cache size. Another point is that, because this only gets rid of the regeneratable parts of a plancache entry, it isn't really going to move the needle all that far in terms of total space consumption. As an experiment, I instrumented GetCachedPlan right after where it fills in the gplan field to see the relative sizes of the cache entry's contexts. Running that over the core + PL regression tests, I get 14 GetCachedPlan: 1024 context, 1024 query_context, 1024 gplan 4 GetCachedPlan: 1024 context, 2048 query_context, 1024 gplan 1 GetCachedPlan: 1024 context, 2048 query_context, 2048 gplan 2109 GetCachedPlan: 2048 context, 2048 query_context, 2048 gplan 29 GetCachedPlan: 2048 context, 2048 query_context, 4096 gplan 6 GetCachedPlan: 2048 context, 4096 query_context, 2048 gplan 33 GetCachedPlan: 2048 context, 4096 query_context, 4096 gplan 2 GetCachedPlan: 2048 context, 4096 query_context, 8192 gplan 1 GetCachedPlan: 2048 context, 8192 query_context, 16384 gplan 4 GetCachedPlan: 2048 context, 8192 query_context, 4096 gplan 2 GetCachedPlan: 2048 context, 8192 query_context, 8192 gplan 8 GetCachedPlan: 3480 context, 8192 query_context, 8192 gplan 250 GetCachedPlan: 4096 context, 2048 query_context, 2048 gplan 107 GetCachedPlan: 4096 context, 2048 query_context, 4096 gplan 3 GetCachedPlan: 4096 context, 4096 query_context, 16384 gplan 1 GetCachedPlan: 4096 context, 4096 query_context, 2048 gplan 7 GetCachedPlan: 4096 context, 4096 query_context, 32768 gplan 190 GetCachedPlan: 4096 context, 4096 query_context, 4096 gplan 61 GetCachedPlan: 4096 context, 4096 query_context, 8192 gplan 11 GetCachedPlan: 4096 context, 8192 query_context, 4096 gplan 587 GetCachedPlan: 4096 context, 8192 query_context, 8192 gplan 1 GetCachedPlan: 4096 context, 16384 query_context, 8192 gplan 5 GetCachedPlan: 4096 context, 32768 query_context, 32768 gplan 1 GetCachedPlan: 4096 context, 65536 query_context, 65536 gplan 12 GetCachedPlan: 8192 context, 4096 query_context, 4096 gplan 2 GetCachedPlan: 8192 context, 4096 query_context, 8192 gplan 49 GetCachedPlan: 8192 context, 8192 query_context, 16384 gplan 46 GetCachedPlan: 8192 context, 8192 query_context, 8192 gplan 10 GetCachedPlan: 8192 context, 16384 query_context, 16384 gplan 1 GetCachedPlan: 8192 context, 16384 query_context, 32768 gplan 1 GetCachedPlan: 8192 context, 16384 query_context, 8192 gplan 1 GetCachedPlan: 8192 context, 32768 query_context, 32768 gplan 2 GetCachedPlan: 8192 context, 131072 query_context, 131072 gplan 3 GetCachedPlan: 16384 context, 8192 query_context, 16384 gplan 1 GetCachedPlan: 16384 context, 16384 query_context, 16384 gplan 2 GetCachedPlan: 16384 context, 16384 query_context, 17408 gplan 1 GetCachedPlan: 16384 context, 16384 query_context, 32768 gplan 1 GetCachedPlan: 16384 context, 16384 query_context, 65536 gplan (The first column is the number of occurrences of the log entry; I got this list from "grep|sort|uniq -c" on the postmaster log.) Patch for this is attached, just in the interests of full disclosure. So yeah, there are cases where you can save a whole lot by deleting the query_context and/or gplan, but they're pretty few and far between; more commonly, the nonreclaimable data in "context" accounts for a third of the usage. (BTW, it looks to me like the code tries to keep the *total* usage under the GUC limit, not the freeable usage. Which means it won't be hard at all to drive it to the worst case where it tries to free everything all the time, if the nonreclaimable data is already over the limit.) Admittedly, the regression tests might well not be representative of common usage, but if you don't want to take them as a benchmark then we need some other benchmark we can look at. I also notice that this doesn't seem to be doing anything with CachedExpressions, which are likely to be a pretty big factor in the usage of e.g. plpgsql. Now, I'd be the first to say that my thoughts about this are probably biased by my time at Salesforce, where their cache-consumption problems were driven by lots and lots and lots and lots (and lots) of plpgsql code. Maybe with another usage pattern you'd come to a different conclusion, but if I were trying to deal with that situation, what I'd look at doing is reclaiming stuff starting at the plpgsql function cache level, and then cascading down to the plancache entries referenced by a plpgsql function body you've chosen to free. One major advantage of doing that is that plpgsql has a pretty clear idea of when a given function cache instance has gone to zero refcount, whereas plancache.c simply doesn't know that. As far as the crash issue is concerned, I notice that right now the cfbot is showing green for this patch, but that seems to just be because the behavior is unstable. I see crashes in "make installcheck-parallel" about 50% of the time with this patch applied. Since, in fact, the patch is not supposed to be doing anything at all with prepared_statement_limit set to zero, that suggests sloppiness in the refactoring that was done to separate out the resource-freeing code. On the other hand, if I do ALTER SYSTEM SET prepared_statement_limit = 1 and then run "make installcheck-parallel", I see a different set of failures. It rather looks like the patch is deleting plans out from under plpgsql, which connects back to my point about plancache.c not really knowing whether a plan is in use or not. Certainly, EnforcePreparedStatementLimit as coded here has no idea about that. (Speaking of ALTER SYSTEM, why the devil is the variable PGC_POSTMASTER? That seems entirely silly.) Aside from Alvaro's style points, I'm fairly concerned about the overhead the patch will add when active. Running through the entire plancache and collecting memory stats is likely to be quite an expensive proposition when there are lots of entries, yet this is willing to do that over again at the drop of a hat. It won't be hard to get this to expend O(N^2) time with N entries. I think that for acceptable performance, it'll be necessary to track total usage incrementally instead of doing it this way. regards, tom lane
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c index abc3062..210c049 100644 --- a/src/backend/utils/cache/plancache.c +++ b/src/backend/utils/cache/plancache.c @@ -431,6 +431,37 @@ CompleteCachedPlan(CachedPlanSource *plansource, plansource->is_complete = true; plansource->is_valid = true; + + { + MemoryContextCounters counters; + MemoryContext context; + Size csize, + qcsize = 0, + gpsize = 0; + + memset(&counters, 0, sizeof(counters)); + context = plansource->context; + context->methods->stats(context, NULL, NULL, &counters); + csize = counters.totalspace; + + if (plansource->query_context) + { + memset(&counters, 0, sizeof(counters)); + context = plansource->query_context; + context->methods->stats(context, NULL, NULL, &counters); + qcsize = counters.totalspace; + } + + if (plansource->gplan) + { + memset(&counters, 0, sizeof(counters)); + context = plansource->gplan->context; + context->methods->stats(context, NULL, NULL, &counters); + gpsize = counters.totalspace; + } + elog(LOG, "CompleteCachedPlan: %zu context, %zu query_context, %zu gplan", + csize, qcsize, gpsize); + } } /* @@ -1188,6 +1219,38 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams, /* Update generic_cost whenever we make a new generic plan */ plansource->generic_cost = cached_plan_cost(plan, false); + + { + MemoryContextCounters counters; + MemoryContext context; + Size csize, + qcsize = 0, + gpsize = 0; + + memset(&counters, 0, sizeof(counters)); + context = plansource->context; + context->methods->stats(context, NULL, NULL, &counters); + csize = counters.totalspace; + + if (plansource->query_context) + { + memset(&counters, 0, sizeof(counters)); + context = plansource->query_context; + context->methods->stats(context, NULL, NULL, &counters); + qcsize = counters.totalspace; + } + + if (plansource->gplan) + { + memset(&counters, 0, sizeof(counters)); + context = plansource->gplan->context; + context->methods->stats(context, NULL, NULL, &counters); + gpsize = counters.totalspace; + } + elog(LOG, "GetCachedPlan: %zu context, %zu query_context, %zu gplan", + csize, qcsize, gpsize); + } + /* * If, based on the now-known value of generic_cost, we'd not have * chosen to use a generic plan, then forget it and make a custom