Thanks for the review and commits so far. Here's a rebased, debugged and pgindented version of the remaining patches. I ran pgindent with --list-of-typedefs="SharedRecordTableKey,SharedRecordTableEntry,SharedTypmodTableEntry,SharedRecordTypmodRegistry,Session" to fix some weirdness around these new typenames.
While rebasing the 0002 patch (removal of tqueue.c's remapping logic), I modified the interface of the newly added ExecParallelCreateReaders() function from commit 51daa7bd because it no longer has any reason to take a TupleDesc. On Fri, Aug 25, 2017 at 1:46 PM, Andres Freund <and...@anarazel.de> wrote: > On 2017-08-21 11:02:52 +1200, Thomas Munro wrote: >> 2. Andres didn't like what I did to DecrTupleDescRefCount, namely >> allowing to run when there is no ResourceOwner. I now see that this >> is probably an indication of a different problem; even if there were a >> worker ResourceOwner as he suggested (or perhaps a session-scoped one, >> which a worker would reset before being reused), it wouldn't be the >> one that was active when the TupleDesc was created. I think I have >> failed to understand the contracts here and will think/read about it >> some more. > > Maybe I'm missing something, but isn't the issue here that using > DecrTupleDescRefCount() simply is wrong, because we're not actually > necessarily tracking the TupleDesc via the resowner mechanism? Yeah. Thanks. > If you look at the code, in the case it's a previously unknown tupledesc > it's registered with: > > entDesc = CreateTupleDescCopy(tupDesc); > ... > /* mark it as a reference-counted tupdesc */ > entDesc->tdrefcount = 1; > ... > RecordCacheArray[newtypmod] = entDesc; > ... > > Note that there's no PinTupleDesc(), IncrTupleDescRefCount() or > ResourceOwnerRememberTupleDesc() managing the reference from the > array. Nor was there one before. > > We have other code managing TupleDesc lifetimes similarly, and look at > how they're freeing it: > /* Delete tupdesc if we have it */ > if (typentry->tupDesc != NULL) > { > /* > * Release our refcount, and free the tupdesc if none > remain. > * (Can't use DecrTupleDescRefCount because this > reference is not > * logged in current resource owner.) > */ > Assert(typentry->tupDesc->tdrefcount > 0); > if (--typentry->tupDesc->tdrefcount == 0) > FreeTupleDesc(typentry->tupDesc); > typentry->tupDesc = NULL; > } Right. I have changed shared_record_typmod_registry_worker_detach() to be more like that, with an explanation. > This also made me think about how we're managing the lookup from the > shared array: > > /* > * Our local array can now point > directly to the TupleDesc > * in shared memory. > */ > RecordCacheArray[typmod] = tupdesc; > > Uhm. Isn't that highly highly problematic? E.g. tdrefcount manipulations > which are done by all lookups (cf. lookup_rowtype_tupdesc()) would in > that case manipulate shared memory in a concurrency unsafe manner. No. See this change, in that and similar code paths: - IncrTupleDescRefCount(tupDesc); + PinTupleDesc(tupDesc); The difference between IncrTupleDescRefCount() and PinTupleDesc() is that the latter recognises non-refcounted tuple descriptors (tdrefcount == -1) and does nothing. Shared tuple descriptors are not reference counted (see TupleDescCopy() which initialises dst->tdrefcount to -1). It was for foolish symmetry that I was trying to use ReleaseTupleDesc() in shared_record_typmod_registry_detach() before, since it also knows about non-refcounted tuple descriptors, but that's not appropriate: it calls DecrTupleDescRefCount() which assumes that we're using resource owners. We're not. To summarise the object lifetime management situation created by this patch: shared TupleDesc objects accumulate in per-session DSM memory until eventually the session ends and the DSM memory goes away. A bit like CacheMemoryContext: there is no retail cleanup of shared TupleDesc objects. BUT: the DSM detach callback is used to clear out backend-local pointers to that stuff (and any non-shared reference counted TupleDesc objects that might be found), in anticipation of being able to reuse a worker process one day (which will involve attaching to a new session, so we mustn't retain any traces of the previous session in our local state). Maybe I'm trying to be a little too clairvoyant there... I improved the cleanup code: now it frees RecordCacheArray and RecordCacheHash and reinstalls NULL pointers. Also it deals with errors in GetSessionDsmHandle() better. I renamed the members of Session to include a "shared_" prefix, which seems a bit clearer. I refactored it so that it never makes needless local copies of TupleDesc objects (previously assign_record_type_typmod() would create an extra local copy and cache that, which was wasteful). That actually makes much of the discussion above moot: on detach, a worker should now ONLY find shared non-refcounted TupleDesc objects in the local caches, so the FreeTupleDesc() case is unreachable... The leader on the other hand can finish up with a mixture of local and shared TupleDesc objects in its cache, if it had some before it ran a parallel query. Its detach hook doesn't try to free those so it doesn't matter. -- Thomas Munro http://www.enterprisedb.com
shared-record-typmods-v10.patchset.tgz
Description: GNU Zip compressed data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers