On 2019/04/15 2:38, Tom Lane wrote: > Amit Langote <langote_amit...@lab.ntt.co.jp> writes: >> On 2019/04/10 15:42, Michael Paquier wrote: >>> It seems to me that Tom's argument to push in the way relcache >>> information is handled by copying its contents sounds sensible to me. >>> That's not perfect, but it is consistent with what exists (note >>> PublicationActions for a rather-still-not-much-complex example of >>> structure which gets copied from the relcache). > >> I gave my vote for direct access of unchangeable relcache substructures >> (TupleDesc, PartitionDesc, etc.), because they're accessed during the >> planning of every user query and fairly expensive to copy compared to list >> of indexes or PublicationActions that you're citing. To further my point >> a bit, I wonder why PublicationActions needs to be copied out of relcache >> at all? Looking at its usage in execReplication.c, it seems we can do >> fine without copying, because we are holding both a lock and a relcache >> reference on the replication target relation during the entirety of >> apply_handle_insert(), which means that information can't change under us, >> neither logically nor physically. > > So the point here is that that reasoning is faulty. You *cannot* assume, > no matter how strong a lock or how many pins you hold, that a relcache > entry will not get rebuilt underneath you. Cache flushes happen > regardless. And unless relcache.c takes special measures to prevent it, > a rebuild will result in moving subsidiary data structures and thereby > breaking any pointers you may have pointing into those data structures. > > For certain subsidiary structures such as the relation tupdesc, > we do take such special measures: that's what the "keep_xxx" dance in > RelationClearRelation is. However, that's expensive, both in cycles > and maintenance effort: it requires having code that can decide equality > of the subsidiary data structures, which we might well have no other use > for, and which we certainly don't have strong tests for correctness of. > It's also very error-prone for callers, because there isn't any good way > to cross-check that code using a long-lived pointer to a subsidiary > structure is holding a lock that's strong enough to guarantee non-mutation > of that structure, or even that relcache.c provides any such guarantee > at all. (If our periodic attempts to reduce lock strength for assorted > DDL operations don't scare the pants off you in this connection, you have > not thought hard enough about it.) So I think that even though we've > largely gotten away with this approach so far, it's also a half-baked > kluge that we should be looking to get rid of, not extend to yet more > cases.
Thanks for the explanation. I understand that simply having a lock and a nonzero refcount on a relation doesn't prevent someone else from changing it concurrently. I get that we want to get rid of the keep_* kludge in the long term, but is it wrong to think, for example, that having keep_partdesc today allows us today to keep the pointer to rd_partdesc as long as we're holding the relation open or refcnt on the whole relation such as with PartitionDirectory mechanism? > To my mind there are only two trustworthy solutions to the problem of > wanting time-extended usage of a relcache subsidiary data structure: one > is to copy it, and the other is to reference-count it. I think that going > over to a reference-count-based approach for many of these structures > might well be something we should do in future, maybe even the very near > future. In the mean time, though, I'm not really satisfied with inserting > half-baked kluges, especially not ones that are different from our other > half-baked kluges for similar purposes. I think that's a path to creating > hard-to-reproduce bugs. +1 to reference-count-based approach. I've occasionally wondered why there is both keep_tupdesc kludge and refcounting for table TupleDescs. I guess it's because *only* the TupleTableSlot mechanism in the executor uses TupleDesc pinning (that is, refcounting) and the rest of the sites depend on the shaky guarantee provided by keep_tupdesc. Thanks, Amit