On Thu, Mar 9, 2017 at 11:19 AM, Peter Geoghegan <p...@bowt.ie> wrote: > That patch seems to be solving the problem by completely taking over > management of temp files from fd.c. That is, these temp files are not > marked as temp files in the way ordinary temp BufFiles are, with > explicit buy-in from resowner.c about their temp-ness. It adds > "#include "catalog/pg_tablespace.h" to buffile.c, even though that > kind of thing generally lives within fd.c. It does use > PG_TEMP_FILE_PREFIX, but never manages to use temp_tablespaces if > that's set by user. It also doesn't do anything about temp_file_limit > enforcement.
Actually, it's a bigger departure than I suggest here, even. Currently, literally no BufFile caller ever gets anything other than fd.c-wise temp files (those marked FD_TEMPORARY). This is the closest that buffile.c comes to allowing this: #ifdef NOT_USED /* * Create a BufFile and attach it to an already-opened virtual File. * * This is comparable to fdopen() in stdio. This is the only way at present * to attach a BufFile to a non-temporary file. Note that BufFiles created * in this way CANNOT be expanded into multiple files. */ BufFile * BufFileCreate(File file) { return makeBufFile(file); } #endif IOW, I cannot see why 0007-hj-shared-buf-file-v6.patch leaves parallel hash join with BufFiles that are even capable of being expanded past 1GiB (1 segment), which is surely not acceptable -- it's a bug. Have I just missed something? (This is not to be confused with BufFiles that are interXact -- those are allowed, but won't work here either. This is why Thomas changed PHJ to not do it that way some months back.) At first I thought that it was okay because BufFiles are immutable/"readonly" once handed over from workers, but of course the PHJ SharedBufFile mechanism is up-front, and does all resource management in shared memory. Thus, it must even create BufFiles in workers that have this only-one-segment restriction as a consequence of not being temp files (in the conventional sense: FD_TEMPORARY fd.c segments). I suggest that the MAX_PHYSICAL_FILESIZE stress-test I sometimes perform [1] be incorporated in Thomas' testing. (Looks more carefully...) I now see that PathNameCreateFile() is being called, a new function which is largely just a new interface to the existing OpenTemporaryFileInTablespace() temp files. So, if you look carefully, you notice that you do in fact end up with FD_TEMPORARY fd.c segments here...so maybe temp_file_limit isn't broken, because, it turns out, 0007-hj-shared-buf-file-v6.patch is still *halfway* buying into the conventional idea of file temp-ness. But buffile.c if left behind ("file->isTemp == false"), so AFAICT it must still be broken simply because new segments cannot be written, per BufFileCreate() comments quoted above. This has become more of a PHJ review, which is off-topic for this thread. But my observations illustrate the difficulty with tying resource manager resources in local memory (temp segments, and associated BufFile(s)) with clean-up at shared memory segment detachment. It's possible, but ticklish enough that you'll end up with some wart or other when you're done. The question, then, is what wart is the most acceptable for parallel tuplesort? I see two plausible paths forward right now: 1. Selectively suppress spurious ENOENT-on-unlink() LOG message in the brief window where it might be spurious. This is the easiest option. (There is less code this way.) 2. Do more or less what I've already drafted as my V9, which is described in opening mail to this thread, while accepting the ugliness that that approach brings with it. [1] postgr.es/m/cam3swzrwdntkhig0gyix_1muaypik3dv6-6542pye2iel-f...@mail.gmail.com -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers