On Sat, Aug 31, 2024 at 10:33 PM Andrei Lepikhov <lepi...@gmail.com> wrote: > On 29/8/2024 11:01, Alexander Korotkov wrote: > > On Mon, Aug 26, 2024 at 11:26 AM Alexander Korotkov > >> Secondly, I'm not terribly happy with current state of type cache. > >> The caller of lookup_type_cache() might get already invalidated data. > >> This probably OK, because caller probably hold locks on dependent > >> objects to guarantee that relevant properties of type actually > >> persists. At very least this should be documented, but it doesn't > >> seem so. Setting of tupdesc is sensitive to its order of execution. > >> That feels quite fragile to me, and not documented either. I think > >> this area needs improvements before we push additional functionality > >> there. > > > > I see fdd965d074 added a proper handling for concurrent invalidation > > for relation cache. If a concurrent invalidation occurs, we retry > > building a relation descriptor. Thus, we end up with returning of a > > valid relation descriptor to caller. I wonder if we can take the same > > approach to type cache. That would make the whole type cache more > > consistent and less fragile. Also, this patch will be simpler. > I think I understand the solution from the commit fdd965d074. > Just for the record, you mentioned invalidation inside the > lookup_type_cache above. Passing through the code, I found the only > place for such a case - the call of the GetDefaultOpClass, which > triggers the opening of the relation pg_opclass, which can cause an > AcceptInvalidationMessages call. Did you mean this case, or does a wider > field of cases exist here?
I've tried to implement handling of concurrent invalidation similar to commit fdd965d074. However that appears to be more difficult that I thought, because for some datatypes like arrays, ranges etc we might need fill the element type and reference it. So, I decided to continue with the current approach but borrowing some ideas from fdd965d074. The revised patchset attached. 0001 - adds comment about concurrent invalidation handling 0002 - revised c14d4acb8. Now we track type oids, whose TypeCacheEntry's filing is in-progress. Add entry to RelIdToTypeIdCacheHash at the end of lookup_type_cache() or on the transaction abort. During invalidation don't assert RelIdToTypeIdCacheHash to be here if TypeCacheEntry is in-progress. ------ Regards, Alexander Korotkov Supabase
From 0ed4be04b42acab74efc59ea35772fd51f1b954c Mon Sep 17 00:00:00 2001 From: Alexander Korotkov <akorotkov@postgresql.org> Date: Fri, 13 Sep 2024 02:10:04 +0300 Subject: [PATCH v11 1/2] Update header comment for lookup_type_cache() Describe the way we handle concurrent invalidation messages. --- src/backend/utils/cache/typcache.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c index 2ec136b7d30..11382547ec0 100644 --- a/src/backend/utils/cache/typcache.c +++ b/src/backend/utils/cache/typcache.c @@ -351,6 +351,15 @@ type_cache_syshash(const void *key, Size keysize) * invalid. Note however that we may fail to find one or more of the * values requested by 'flags'; the caller needs to check whether the fields * are InvalidOid or not. + * + * Note that while filling TypeCacheEntry we might process concurrent + * invalidation messages, causing our not-yet-filled TypeCacheEntry to be + * invalidated. In this case, we typically only clear flags while values are + * still available for the caller. It's expected that the caller holds + * enough locks on type-depending objects that the values are still relevant. + * It's also important that the tupdesc is filled in the last for + * TYPTYPE_COMPOSITE. So, it can't get invalidated during the + * lookup_type_cache() call. */ TypeCacheEntry * lookup_type_cache(Oid type_id, int flags) -- 2.39.3 (Apple Git-146)
From 9735f1266f3dad2955758a28814c4b8a593d834b Mon Sep 17 00:00:00 2001 From: Alexander Korotkov <akorotkov@postgresql.org> Date: Tue, 10 Sep 2024 23:25:04 +0300 Subject: [PATCH v11 2/2] Avoid looping over all type cache entries in TypeCacheRelCallback() Currently when a single relcache entry gets invalidated, TypeCacheRelCallback() has to loop over all type cache entries to find appropriate typentry to invalidate. Unfortunately, using the syscache here is impossible, because this callback could be called outside a transaction and this makes impossible catalog lookups. This is why present commit introduces RelIdToTypeIdCacheHash to map relation OID to its composite type OID. We are keeping RelIdToTypeIdCacheHash entry while corresponding type cache entry have something to clean. Therefore, RelIdToTypeIdCacheHash shouldn't get bloat in the case of temporary tables flood. Discussion: https://postgr.es/m/5812a6e5-68ae-4d84-9d85-b443176966a1%40sigaev.ru Author: Teodor Sigaev Reviewed-by: Aleksander Alekseev, Tom Lane, Michael Paquier, Roman Zharkov Reviewed-by: Andrei Lepikhov, Pavel Borisov --- src/backend/access/transam/xact.c | 10 + src/backend/utils/cache/typcache.c | 350 +++++++++++++++++++++++++---- src/include/utils/typcache.h | 4 + src/tools/pgindent/typedefs.list | 1 + 4 files changed, 321 insertions(+), 44 deletions(-) diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index 87700c7c5c7..b0b05e28790 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -70,6 +70,7 @@ #include "utils/snapmgr.h" #include "utils/timeout.h" #include "utils/timestamp.h" +#include "utils/typcache.h" /* * User-tweakable parameters @@ -2407,6 +2408,9 @@ CommitTransaction(void) /* Clean up the relation cache */ AtEOXact_RelationCache(true); + /* Clean up the type cache */ + AtEOXact_TypeCache(); + /* * Make catalog changes visible to all backends. This has to happen after * relcache references are dropped (see comments for @@ -2709,6 +2713,9 @@ PrepareTransaction(void) /* Clean up the relation cache */ AtEOXact_RelationCache(true); + /* Clean up the type cache */ + AtEOXact_TypeCache(); + /* notify doesn't need a postprepare call */ PostPrepare_PgStat(); @@ -2951,6 +2958,7 @@ AbortTransaction(void) false, true); AtEOXact_Buffers(false); AtEOXact_RelationCache(false); + AtEOXact_TypeCache(); AtEOXact_Inval(false); AtEOXact_MultiXact(); ResourceOwnerRelease(TopTransactionResourceOwner, @@ -5153,6 +5161,7 @@ CommitSubTransaction(void) true, false); AtEOSubXact_RelationCache(true, s->subTransactionId, s->parent->subTransactionId); + AtEOSubXact_TypeCache(); AtEOSubXact_Inval(true); AtSubCommit_smgr(); @@ -5328,6 +5337,7 @@ AbortSubTransaction(void) AtEOSubXact_RelationCache(false, s->subTransactionId, s->parent->subTransactionId); + AtEOSubXact_TypeCache(); AtEOSubXact_Inval(false); ResourceOwnerRelease(s->curTransactionOwner, RESOURCE_RELEASE_LOCKS, diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c index 11382547ec0..dcbf50e865b 100644 --- a/src/backend/utils/cache/typcache.c +++ b/src/backend/utils/cache/typcache.c @@ -77,6 +77,20 @@ /* The main type cache hashtable searched by lookup_type_cache */ static HTAB *TypeCacheHash = NULL; +/* + * The mapping of relation's OID to the corresponding composite type OID. + * We're keeping the map entry when the corresponding typentry has something + * to clear i.e it has either TCFLAGS_HAVE_PG_TYPE_DATA, or + * TCFLAGS_OPERATOR_FLAGS, or tupdesc. + */ +static HTAB *RelIdToTypeIdCacheHash = NULL; + +typedef struct RelIdToTypeIdCacheEntry +{ + Oid relid; /* OID of the relation */ + Oid composite_typid; /* OID of the relation's composite type */ +} RelIdToTypeIdCacheEntry; + /* List of type cache entries for domain types */ static TypeCacheEntry *firstDomainTypeEntry = NULL; @@ -208,6 +222,10 @@ typedef struct SharedTypmodTableEntry dsa_pointer shared_tupdesc; } SharedTypmodTableEntry; +static Oid *in_progress_list; +static int in_progress_list_len; +static int in_progress_list_maxlen; + /* * A comparator function for SharedRecordTableKey. */ @@ -329,6 +347,8 @@ static void shared_record_typmod_registry_detach(dsm_segment *segment, static TupleDesc find_or_make_matching_shared_tupledesc(TupleDesc tupdesc); static dsa_pointer share_tupledesc(dsa_area *area, TupleDesc tupdesc, uint32 typmod); +static void insert_rel_type_cache_if_needed(TypeCacheEntry *typentry); +static void delete_rel_type_cache_if_needed(TypeCacheEntry *typentry); /* @@ -366,11 +386,22 @@ lookup_type_cache(Oid type_id, int flags) { TypeCacheEntry *typentry; bool found; + int in_progress_offset; if (TypeCacheHash == NULL) { /* First time through: initialize the hash table */ HASHCTL ctl; + int allocsize; + + /* + * reserve enough in_progress_list slots for many cases + */ + allocsize = 4; + in_progress_list = + MemoryContextAlloc(CacheMemoryContext, + allocsize * sizeof(*in_progress_list)); + in_progress_list_maxlen = allocsize; ctl.keysize = sizeof(Oid); ctl.entrysize = sizeof(TypeCacheEntry); @@ -385,6 +416,13 @@ lookup_type_cache(Oid type_id, int flags) TypeCacheHash = hash_create("Type information cache", 64, &ctl, HASH_ELEM | HASH_FUNCTION); + Assert(RelIdToTypeIdCacheHash == NULL); + + ctl.keysize = sizeof(Oid); + ctl.entrysize = sizeof(RelIdToTypeIdCacheEntry); + RelIdToTypeIdCacheHash = hash_create("Map from relid to OID of cached composite type", 64, + &ctl, HASH_ELEM | HASH_BLOBS); + /* Also set up callbacks for SI invalidations */ CacheRegisterRelcacheCallback(TypeCacheRelCallback, (Datum) 0); CacheRegisterSyscacheCallback(TYPEOID, TypeCacheTypCallback, (Datum) 0); @@ -396,6 +434,21 @@ lookup_type_cache(Oid type_id, int flags) CreateCacheMemoryContext(); } + Assert(TypeCacheHash != NULL && RelIdToTypeIdCacheHash != NULL); + + /* Register to catch invalidation messages */ + if (in_progress_list_len >= in_progress_list_maxlen) + { + int allocsize; + + allocsize = in_progress_list_maxlen * 2; + in_progress_list = repalloc(in_progress_list, + allocsize * sizeof(*in_progress_list)); + in_progress_list_maxlen = allocsize; + } + in_progress_offset = in_progress_list_len++; + in_progress_list[in_progress_offset] = type_id; + /* Try to look up an existing entry */ typentry = (TypeCacheEntry *) hash_search(TypeCacheHash, &type_id, @@ -896,6 +949,11 @@ lookup_type_cache(Oid type_id, int flags) load_domaintype_info(typentry); } + Assert(in_progress_offset + 1 == in_progress_list_len); + in_progress_list_len--; + + insert_rel_type_cache_if_needed(typentry); + return typentry; } @@ -2290,6 +2348,53 @@ SharedRecordTypmodRegistryAttach(SharedRecordTypmodRegistry *registry) CurrentSession->shared_typmod_table = typmod_table; } +/* + * InvalidateCompositeTypeCacheEntry + * Invalidate particular TypeCacheEntry on Relcache inval callback + * + * Delete the cached tuple descriptor (if any) for the given composite + * type, and reset whatever info we have cached about the composite type's + * comparability. + */ +static void +InvalidateCompositeTypeCacheEntry(TypeCacheEntry *typentry) +{ + bool hadTupDescOrOpclass; + + Assert(typentry->typtype == TYPTYPE_COMPOSITE && + OidIsValid(typentry->typrelid)); + + hadTupDescOrOpclass = (typentry->tupDesc != NULL) || + (typentry->flags & TCFLAGS_OPERATOR_FLAGS); + + /* Delete tupdesc if we have it */ + if (typentry->tupDesc != NULL) + { + /* + * Release our refcount and free the tupdesc if none remain. We can't + * use DecrTupleDescRefCount here because this reference is not logged + * by the current resource owner. + */ + Assert(typentry->tupDesc->tdrefcount > 0); + if (--typentry->tupDesc->tdrefcount == 0) + FreeTupleDesc(typentry->tupDesc); + typentry->tupDesc = NULL; + + /* + * Also clear tupDesc_identifier, so that anyone watching it will + * realize that the tupdesc has changed. + */ + typentry->tupDesc_identifier = 0; + } + + /* Reset equality/comparison/hashing validity information */ + typentry->flags &= ~TCFLAGS_OPERATOR_FLAGS; + + /* Call check_delete_rel_type_cache() if we actually cleared something */ + if (hadTupDescOrOpclass) + delete_rel_type_cache_if_needed(typentry); +} + /* * TypeCacheRelCallback * Relcache inval callback function @@ -2299,63 +2404,55 @@ SharedRecordTypmodRegistryAttach(SharedRecordTypmodRegistry *registry) * whatever info we have cached about the composite type's comparability. * * This is called when a relcache invalidation event occurs for the given - * relid. We must scan the whole typcache hash since we don't know the - * type OID corresponding to the relid. We could do a direct search if this - * were a syscache-flush callback on pg_type, but then we would need all - * ALTER-TABLE-like commands that could modify a rowtype to issue syscache - * invals against the rel's pg_type OID. The extra SI signaling could very - * well cost more than we'd save, since in most usages there are not very - * many entries in a backend's typcache. The risk of bugs-of-omission seems - * high, too. - * - * Another possibility, with only localized impact, is to maintain a second - * hashtable that indexes composite-type typcache entries by their typrelid. - * But it's still not clear it's worth the trouble. + * relid. We can't use syscache to find a type corresponding to the given + * relation because the code can be called outside of transaction. Thus we use + * the RelIdToTypeIdCacheHash map to locate appropriate typcache entry. */ static void TypeCacheRelCallback(Datum arg, Oid relid) { - HASH_SEQ_STATUS status; TypeCacheEntry *typentry; - /* TypeCacheHash must exist, else this callback wouldn't be registered */ - hash_seq_init(&status, TypeCacheHash); - while ((typentry = (TypeCacheEntry *) hash_seq_search(&status)) != NULL) + /* + * RelIdToTypeIdCacheHash and TypeCacheHash should exist, otherwise this + * callback wouldn't be registered + */ + if (OidIsValid(relid)) { - if (typentry->typtype == TYPTYPE_COMPOSITE) + RelIdToTypeIdCacheEntry *relentry; + + /* + * Find an RelIdToTypeIdCacheHash entry, which should exist as soon as + * corresponding typcache entry has something to clean. + */ + relentry = (RelIdToTypeIdCacheEntry *) hash_search(RelIdToTypeIdCacheHash, + &relid, + HASH_FIND, NULL); + + if (relentry != NULL) { - /* Skip if no match, unless we're zapping all composite types */ - if (relid != typentry->typrelid && relid != InvalidOid) - continue; + typentry = (TypeCacheEntry *) hash_search(TypeCacheHash, + &relentry->composite_typid, + HASH_FIND, NULL); - /* Delete tupdesc if we have it */ - if (typentry->tupDesc != NULL) + if (typentry != NULL) { - /* - * Release our refcount, and free the tupdesc if none remain. - * (Can't use DecrTupleDescRefCount because this reference is - * not logged in current resource owner.) - */ - Assert(typentry->tupDesc->tdrefcount > 0); - if (--typentry->tupDesc->tdrefcount == 0) - FreeTupleDesc(typentry->tupDesc); - typentry->tupDesc = NULL; + Assert(typentry->typtype == TYPTYPE_COMPOSITE); + Assert(relid == typentry->typrelid); - /* - * Also clear tupDesc_identifier, so that anything watching - * that will realize that the tupdesc has possibly changed. - * (Alternatively, we could specify that to detect possible - * tupdesc change, one must check for tupDesc != NULL as well - * as tupDesc_identifier being the same as what was previously - * seen. That seems error-prone.) - */ - typentry->tupDesc_identifier = 0; + InvalidateCompositeTypeCacheEntry(typentry); } - - /* Reset equality/comparison/hashing validity information */ - typentry->flags &= ~TCFLAGS_OPERATOR_FLAGS; } - else if (typentry->typtype == TYPTYPE_DOMAIN) + + /* + * Visit all the domain types sequentially. Typically this shouldn't + * affect performance since domain types are less tended to bloat. + * Domain types are created manually, unlike composite types which are + * automatically created for every temporary table. + */ + for (typentry = firstDomainTypeEntry; + typentry != NULL; + typentry = typentry->nextDomain) { /* * If it's domain over composite, reset flags. (We don't bother @@ -2367,6 +2464,36 @@ TypeCacheRelCallback(Datum arg, Oid relid) typentry->flags &= ~TCFLAGS_OPERATOR_FLAGS; } } + else + { + HASH_SEQ_STATUS status; + + /* + * Relid is invalid. By convention we need to reset all composite + * types in cache. Also, we should reset flags for domain types, and + * we loop over all entries in hash, so, do it in a single scan. + */ + hash_seq_init(&status, TypeCacheHash); + while ((typentry = (TypeCacheEntry *) hash_seq_search(&status)) != NULL) + { + if (typentry->typtype == TYPTYPE_COMPOSITE) + { + InvalidateCompositeTypeCacheEntry(typentry); + } + else if (typentry->typtype == TYPTYPE_DOMAIN) + { + /* + * If it's domain over composite, reset flags. (We don't + * bother trying to determine whether the specific base type + * needs a reset.) Note that if we haven't determined whether + * the base type is composite, we don't need to reset + * anything. + */ + if (typentry->flags & TCFLAGS_DOMAIN_BASE_IS_COMPOSITE) + typentry->flags &= ~TCFLAGS_OPERATOR_FLAGS; + } + } + } } /* @@ -2397,6 +2524,8 @@ TypeCacheTypCallback(Datum arg, int cacheid, uint32 hashvalue) while ((typentry = (TypeCacheEntry *) hash_seq_search(&status)) != NULL) { + bool hadPgTypeData = (typentry->flags & TCFLAGS_HAVE_PG_TYPE_DATA); + Assert(hashvalue == 0 || typentry->type_id_hash == hashvalue); /* @@ -2406,6 +2535,13 @@ TypeCacheTypCallback(Datum arg, int cacheid, uint32 hashvalue) */ typentry->flags &= ~(TCFLAGS_HAVE_PG_TYPE_DATA | TCFLAGS_CHECKED_DOMAIN_CONSTRAINTS); + + /* + * Call check_delete_rel_type_cache() if we cleaned + * TCFLAGS_HAVE_PG_TYPE_DATA flag previously. + */ + if (hadPgTypeData) + delete_rel_type_cache_if_needed(typentry); } } @@ -2914,3 +3050,129 @@ shared_record_typmod_registry_detach(dsm_segment *segment, Datum datum) } CurrentSession->shared_typmod_registry = NULL; } + +/* + * Insert RelIdToTypeIdCacheHash entry if needed. + */ +static void +insert_rel_type_cache_if_needed(TypeCacheEntry *typentry) +{ + /* Immediately quit for non-composite types */ + if (typentry->typtype != TYPTYPE_COMPOSITE) + return; + + /* typrelid should be given for composite types */ + Assert(OidIsValid(typentry->typrelid)); + + /* + * Insert a RelIdToTypeIdCacheHash entry if the typentry have any + * information indicating it should be here. + */ + if ((typentry->flags & TCFLAGS_HAVE_PG_TYPE_DATA) || + (typentry->flags & TCFLAGS_OPERATOR_FLAGS) || + typentry->tupDesc != NULL) + { + RelIdToTypeIdCacheEntry *relentry; + bool found; + + relentry = (RelIdToTypeIdCacheEntry *) hash_search(RelIdToTypeIdCacheHash, + &typentry->typrelid, + HASH_ENTER, &found); + relentry->relid = typentry->typrelid; + relentry->composite_typid = typentry->type_id; + } +} + +/* + * Delete entry RelIdToTypeIdCacheHash if needed after resetting of the + * TCFLAGS_HAVE_PG_TYPE_DATA flag, or any of TCFLAGS_OPERATOR_FLAGS flags, + * or tupDesc. + */ +static void +delete_rel_type_cache_if_needed(TypeCacheEntry *typentry) +{ +#ifdef USE_ASSERT_CHECKING + int i; + bool is_in_progress = false; + + for (i = 0; i < in_progress_list_len; i++) + { + if (in_progress_list[i] == typentry->type_id) + { + is_in_progress = true; + break; + } + } +#endif + + /* Immediately quit for non-composite types */ + if (typentry->typtype != TYPTYPE_COMPOSITE) + return; + + /* typrelid should be given for composite types */ + Assert(OidIsValid(typentry->typrelid)); + + /* + * Delete a RelIdToTypeIdCacheHash entry if the typentry doesn't have any + * information indicating entry should be still there. + */ + if (!(typentry->flags & TCFLAGS_HAVE_PG_TYPE_DATA) && + !(typentry->flags & TCFLAGS_OPERATOR_FLAGS) && + typentry->tupDesc == NULL) + { + bool found; + + (void) hash_search(RelIdToTypeIdCacheHash, + &typentry->typrelid, + HASH_REMOVE, &found); + Assert(found || is_in_progress); + } + else + { +#ifdef USE_ASSERT_CHECKING + /* + * In assert-enabled builds otherwise check for RelIdToTypeIdCacheHash + * entry if it should exist. + */ + bool found; + + if (!is_in_progress) + { + (void) hash_search(RelIdToTypeIdCacheHash, + &typentry->typrelid, + HASH_FIND, &found); + Assert(found); + } +#endif + } +} + +static void +cleanup_in_progress_typentries(void) +{ + int i; + + for (i = 0; i < in_progress_list_len; i++) + { + TypeCacheEntry *typentry; + + typentry = (TypeCacheEntry *) hash_search(TypeCacheHash, + &in_progress_list[i], + HASH_FIND, NULL); + insert_rel_type_cache_if_needed(typentry); + } + + in_progress_list_len = 0; +} + +void +AtEOXact_TypeCache(void) +{ + cleanup_in_progress_typentries(); +} + +void +AtEOSubXact_TypeCache(void) +{ + cleanup_in_progress_typentries(); +} diff --git a/src/include/utils/typcache.h b/src/include/utils/typcache.h index f506cc4aa35..f3d73ecee3a 100644 --- a/src/include/utils/typcache.h +++ b/src/include/utils/typcache.h @@ -207,4 +207,8 @@ extern void SharedRecordTypmodRegistryInit(SharedRecordTypmodRegistry *, extern void SharedRecordTypmodRegistryAttach(SharedRecordTypmodRegistry *); +extern void AtEOXact_TypeCache(void); + +extern void AtEOSubXact_TypeCache(void); + #endif /* TYPCACHE_H */ diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index e9ebddde24d..38b60a5944b 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -2378,6 +2378,7 @@ RelFileLocator RelFileLocatorBackend RelFileNumber RelIdCacheEnt +RelIdToTypeIdCacheEntry RelInfo RelInfoArr RelMapFile -- 2.39.3 (Apple Git-146)