On Sat, Dec 2, 2017 at 1:55 PM, Andres Freund <and...@anarazel.de> wrote: > Pushed 0003-Add-infrastructure-for-sharing-temporary-files-betwe.patch > after minor adjustments (some including conversing with you).
Thank you! > Questions: > - Right now RemovePgTempFilesInDir() will recurse into appropriately > named directories, and when it recurses it doesn't require the same > name pattern checks. I think that's good, but I think it'd be prudent > to be a bit more paranoid and prevent recursing into symlinked > subdirectories. That's why it uses lstat(), so that it sees symlinks rather than what they point to. It only recurses if S_ISDIR(), and it unlinks anything else. That means that it unlinks any symlinks rather than (say) following them and deleting stuff outside the temporary directory tree. Example: $ mkdir /tmp/foo $ touch /tmp/foo/canary $ mkdir -p $PGDATA/base/pgsql_tmp/pgsql_tmpXXX/bar $ ln -s /tmp/foo $PGDATA/base/pgsql_tmp/pgsql_tmpXXX/foo $ ls $PGDATA/base/pgsql_tmp/pgsql_tmpXXX bar foo $ postgres/bin/postgres -D $PGDATA ... ^C ... $ ls $PGDATA/base/pgsql_tmp $ ls /tmp/foo canary Make sense? > - As we don't clean temp files after crash-restarts it isn't impossible > to have a bunch of crash-restarts and end up with pids *and* per-pid > shared file set counters reused. Which'd lead to conflicts. Do we care? We don't care. PathNameCreateTemporaryDir() on a path that already exists will return silently. PathNameCreateTemporaryFile() on a path that already exists, as mentioned in a comment and following an existing convention, will open and truncate it. So either it was really an orphan and that is a continuation of our traditional recycling behaviour, or the calling code has a bug and used the same path twice and it's not going to end well. Another type of collision we could have in theory is like this: One backend creates a SharedFileSet, and various other backends attach to it. The creator detaches and exits. Later another process is created with the same PID, and creates a new SharedFileSet, and happens to have the same counter number, but the original SharedFileSet is still in existence because there are still running backends attached to it. Then things get ugly. But this seems improbable, at least as long as the usage pattern is that leader processes are the only ones creating SharedFileSet objects and outlive their parallel workers in non-crash outcomes, but if you think that's too flimsy I suppose it could be fixed by replacing the per-backend counter variable with a cluster-wide atomic counter. Then the PID component would be redundant but I'd suggest keeping it anyway for its diagnostic value. Thoughts? Just a reminder: a couple of problems have come up recently in the Parallel Hash Join patch itself, so please don't consider that one ready for commit quite yet. They are: (1) Handling the case where there is no DSA area because we're running a parallel-aware plan in non-parallel mode due to lack of resources; (2) Investigating a rare assertion failure. For (1), that may depend on another patch that I'll post shortly to kill "es_query_dsa" and, come to think of it, for (2) it's possible that the problem is in either one of the remaining patches -- SharedTuplestore or Parallel Hash Join -- so please hold off on committing either of those until I've got to the bottom of that. -- Thomas Munro http://www.enterprisedb.com