On Wed, May 1, 2019 at 11:59 AM Andres Freund <and...@anarazel.de> wrote: > The message I'm replying to is marked as an open item. Robert, what do > you think needs to be done here before release? Could you summarize, > so we then can see what others think?
The original issue on this thread was that hyrax started running out of memory when it hadn't been doing so previously. That happened because, for complicated reasons, commit 898e5e3290a72d288923260143930fb32036c00c resulted in the leak being hit lots of times in CLOBBER_CACHE_ALWAYS builds instead of just once. Commits 2455ab48844c90419714e27eafd235a85de23232 and d3f48dfae42f9655425d1f58f396e495c7fb7812 fixed that problem. In the email at issue, Tom complains about two things. First, he complains about the fact that I try to arrange things so that relcache data remains valid for as long as required instead of just copying it. Second, he complains about the fact repeated ATTACH and DETACH PARTITION operations can produce a slow session-lifespan memory leak. I think it's reasonable to fix the second issue for v12. I am not sure how important it is, because (1) the leak is small, (2) it seems unlikely that anyone would execute enough ATTACH/DETACH PARTITION operations in one backend for the leak to amount to anything significant, and (3) if a relcache flush ever happens when you don't have the relation open, all of the "leaked" memory will be un-leaked. However, I believe that we could fix it as follows. First, invent rd_pdoldcxt and put the first old context there; if that pointer is already in use, then parent the new context under the old one. Second, in RelationDecrementReferenceCount, if the refcount hits 0, nuke rd_pdoldcxt and set the pointer back to NULL. With that change, you would only keep the old PartitionDesc around until the ref count hits 0, whereas at present, you keep the old PartitionDesc around until an invalidation happens while the ref count is 0. I think the first issue is not v12 material. Tom proposed to fix it by copying all the relevant data out of the relcache, but his own performance results show a pretty significant hit, and AFAICS he hasn't pointed to anything that's actually broken by the current approach. What I think should be done is actually generalize the approach I took in this patch, so that instead of the partition directory holding a reference count, the planner or executor holds one. Then not only could people who want information about the PartitionDesc avoid copying stuff from the relcache, but maybe other things as well. I think that would be significantly better than continuing to double-down on the current copy-everything approach, which really only works well in a world where a table can't have all that much metadata, which is clearly not true for PostgreSQL any more. I'm not sure that Tom is actually opposed to this approach -- although I may have misunderstood his position -- but where we disagree, I think, is that I see what I did in this commit as a stepping-stone toward a better world, and he sees it as something that should be killed with fire until that better world has fully arrived. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company