Hi, Here is a new version addressing feedback from Peter and Andres. Please see below.
On Wed, Mar 22, 2017 at 3:18 PM, Peter Geoghegan <p...@bowt.ie> wrote: > On Tue, Mar 21, 2017 at 5:07 AM, Thomas Munro > <thomas.mu...@enterprisedb.com> wrote: >>> buffile.c should stop pretending to care about anything other than >>> temp files, IMV. 100% of all clients that want temporary files go >>> through buffile.c. 100% of all clients that want non-temp files (files >>> which are not marked FD_TEMPORARY) access fd.c directly, rather than >>> going through buffile.c. >> >> I still need BufFile because I want buffering. >> >> There are 3 separate characteristics enabled by flags with 'temporary' >> in their name. I think we should consider separating the concerns by >> splitting and renaming them: >> >> 1. Segmented BufFile behaviour. I propose renaming BufFile's isTemp >> member to isSegmented, because that is what it really does. I want >> that feature independently without getting confused about lifetimes. >> Tested with small MAX_PHYSICAL_FILESIZE as you suggested. > > I would have proposed to get rid of the isTemp field entirely. It is > always true with current usage, any only #ifdef NOT_USED code presumes > that it could be any other way. BufFile is all about temp files, which > ISTM should be formalized. The whole point of BufFile is to segment > fd.c temp file segments. Who would ever want to use BufFile without > that capability anyway? Yeah, it looks like you're probably right, but I guess others could have uses for BufFile that we don't know about. It doesn't seem like it hurts to leave the variable in existence. >> 2. The temp_file_limit system. Currently this applies to fd.c files >> opened with FD_TEMPORARY. You're right that we shouldn't be able to >> escape that sanity check on disk space just because we want to manage >> disk file ownership differently. I propose that we create a new flag >> FD_TEMP_FILE_LIMIT that can be set independentlyisTemp of the flags >> controlling disk file lifetime. When working with SharedBufFileSet, >> the limit applies to each backend in respect of files it created, >> while it has them open. This seems a lot simpler than any >> shared-temp-file-limit type scheme and is vaguely similar to the way >> work_mem applies in each backend for parallel query. > > I agree that that makes sense as a user-visible behavior of > temp_file_limit. This user-visible behavior is what I actually > implemented for parallel CREATE INDEX. Ok, good. >> 3. Delete-on-close/delete-at-end-of-xact. I don't want to use that >> facility so I propose disconnecting it from the above. We c{ould >> rename those fd.c-internal flags FD_TEMPORARY and FD_XACT_TEMPORARY to >> FD_DELETE_AT_CLOSE and FD_DELETE_AT_EOXACT. > > This reliably unlink()s all files, albeit while relying on unlink() > ENOENT as a condition that terminates deletion of one particular > worker's BufFile's segments. However, because you effectively no > longer use resowner.c, ISTM that there is still a resource leak in > error paths. ResourceOwnerReleaseInternal() won't call FileClose() for > temp-ish files (that are not quite temp files in the current sense) in > the absence of no other place managing to do so, such as > BufFileClose(). How can you be sure that you'll actually close() the > FD itself (not vFD) within fd.c in the event of an error? Or Delete(), > which does some LRU maintenance for backend's local VfdCache? Yeah, I definitely need to use resowner.c. The only thing I want to opt out of is automatic file deletion in that code path. > If I follow the new code correctly, then it doesn't matter that you've > unlink()'d to take care of the more obvious resource management chore. > You can still have a reference leak like this, if I'm not mistaken, > because you still have backend local state (local VfdCache) that is > left totally decoupled with the new "shadow resource manager" for > shared BufFiles. You're right. The attached version fixes these problems. The BufFiles created or opened in this new way now participate in both of our leak-detection and clean-up schemes: the one in resowner.c (because I'm now explicitly registering with it as I had failed to do before) and the one in CleanupTempFiles (because FD_CLOSE_AT_EOXACT is set, which I already had in the previous version for the creator, but not the opener of such a file). I tested by commenting out my explicit BufFileClose calls to check that resowner.c starts complaining, and then by commenting out the resowner registration too to check that CleanupTempFiles starts complaining. >> As shown in 0008-hj-shared-buf-file-v8.patch. Thoughts? > > A less serious issue I've also noticed is that you add palloc() calls, > implicitly using the current memory context, within buffile.c. > BufFileOpenTagged() has some, for example. However, there is a note > that we don't need to save the memory context when we open a BufFile > because we always repalloc(). That is no longer the case here. I don't see a problem here. BufFileOpenTagged() is similar to BufFileCreateTemp() which calls makeBufFile() and thereore returns a result that is allocated in the current memory context. This seems like the usual deal. Thanks for the review! On Wed, Mar 22, 2017 at 1:07 AM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > On Thu, Feb 16, 2017 at 3:36 PM, Andres Freund <and...@anarazel.de> wrote: >> I think the synchronization protocol with the various phases needs to be >> documented somewhere. Probably in nodeHashjoin.c's header. > > I will supply that shortly. Added in the attached version. -- Thomas Munro http://www.enterprisedb.com
parallel-shared-hash-v9.tgz
Description: GNU Zip compressed data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers