On Tue, Apr 27, 2021 at 4:34 PM Amit Langote <amitlangot...@gmail.com> wrote: > Thanks for the updated patch. I've been reading it, but I noticed a > bug in 8aba9322511f, which I thought you'd want to know to make a note > of when committing this one. > > So we decided in 8aba9322511f that it is okay to make the memory > context in which a transient partdesc is allocated a child of > PortalContext so that it disappears when the portal does. But > PortalContext is apparently NULL when the planner runs, at least in > the "simple" query protocol, so any transient partdescs built by the > planner would effectively leak. Not good. > > With this patch, partdesc_nodetached is no longer transient, so the > problem doesn't exist. > > I will write more about the updated patch in a bit...
The first thing that struck me about partdesc_nodetached is that it's not handled in RelationClearRelation(), where we (re)set a regular partdesc to NULL so that the next RelationGetPartitionDesc() has to build it from scratch. I think partdesc_nodetached and the xmin should likewise be reset. Also in load_relcache_init_file(), although nothing serious there. That said, I think I may have found a couple of practical problems with partdesc_nodetached, or more precisely with having it side-by-side with regular partdesc. Maybe they can be fixed, so the problems are not as such deal breakers for the patch's main idea. The problems can be seen when different queries in a serializable transaction have to use both the regular partdesc and partdesc_detached in a given relcache. For example, try the following after first creating a situation where the table p has a detach-pending partition p2 (for values in (2) and a live partition p1 (for values in (1)). begin isolation level serializable; insert into p values (1); select * from p where a = 1; insert into p values (1); The 1st insert succeeds but the 2nd fails with: ERROR: no partition of relation "p" found for row DETAIL: Partition key of the failing row contains (a) = (1). I haven't analyzed this error very closely but there is another situation which causes a crash due to what appears to be a conflict with rd_pdcxt's design: -- run in a new session begin isolation level serializable; select * from p where a = 1; insert into p values (1); select * from p where a = 1; The 2nd select crashes: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. The crash occurs because the planner gets handed a stale copy of partdesc_nodetached for the 2nd select. It gets stale, because the context it's allocated in gets made a child of rd_pdcxt, which is in turn assigned the context of the regular partdesc when it is built for the insert query. Any child contexts of rd_pdcxt are deleted as soon as the Relation's refcount goes to zero, taking it with partdesc_nodetached. Note it is this code in RelationBuildPartitionDesc(): /* * But first, a kluge: if there's an old rd_pdcxt, it contains an old * partition descriptor that may still be referenced somewhere. * Preserve it, while not leaking it, by reattaching it as a child * context of the new rd_pdcxt. Eventually it will get dropped by * either RelationClose or RelationClearRelation. */ if (rel->rd_pdcxt != NULL) MemoryContextSetParent(rel->rd_pdcxt, new_pdcxt); rel->rd_pdcxt = new_pdcxt; I think we may need a separate context for partdesc_nodetached, likely with the same kludges as rd_pdcxt. Maybe the first problem will go away with that as well. Few other minor things I noticed: + * Store it into relcache. For snapshots built excluding detached + * partitions, which we save separately, we also record the + * pg_inherits.xmin of the detached partition that was omitted; this + * informs a future potential user of such a cached snapshot. The "snapshot" in the 1st and the last sentence should be "partdesc"? + * We keep two partdescs in relcache: rd_partdesc_nodetached excludes + * partitions marked concurrently being detached, while rd_partdesc includes + * them. IMHO, describing rd_partdesc first in the sentence would be better. Like: rd_partdesc includes all partitions including any that are being concurrently detached, while rd_partdesc_nodetached excludes them. -- Amit Langote EDB: http://www.enterprisedb.com