On Wed, Nov 1, 2017 at 11:29 AM, Peter Geoghegan <p...@bowt.ie> wrote: > On Thu, Oct 26, 2017 at 4:22 AM, Rushabh Lathia > <rushabh.lat...@gmail.com> wrote: >> Attaching the re based patch according to the v22 parallel-hash patch sets > > I took a quick look at this today, and noticed a few issues: > > * make_name() is used to name files in sharedtuplestore.c, which is > what is passed to BufFileOpenShared() for parallel hash join. Your > using your own logic for that within the equivalent logtape.c call to > BufFileOpenShared(), presumably because make_name() wants to identify > participants by PID rather than by an ordinal identifier number.
So that's this bit: + pg_itoa(worker, filename); + lts->pfile = BufFileCreateShared(fileset, filename); ... and: + pg_itoa(i, filename); + file = BufFileOpenShared(fileset, filename); What's wrong with using a worker number like this? > I think that we need some kind of central registry for things that use > shared buffiles. It could be that sharedtuplestore.c is further > generalized to support this, or it could be that they both call > something else that takes care of naming. It's not okay to have this > left to random chance. It's not random choice: buffile.c creates a uniquely named directory (or directories, if you have more than one location configured in the temp_tablespaces GUC) to hold all the backing files involved in each BufFileSet. Naming of BufFiles within the BufFileSet is the caller's problem, and a worker number seems like a reasonable choice to me. It won't collide with a concurrent parallel CREATE INDEX because that'll be using its own BufFileSet. > You're going to have to ask Thomas about this. You should also use > MAXPGPATH for the char buffer on the stack. Here's a summary of namespace management scheme I currently have at the three layers fd.c, buffile.c, sharedtuplestore.c: 1. fd.c has new lower-level functions provides PathNameCreateTemporaryFile(const char *path) and PathNameOpenTemporaryFile(const char *path). It also provides PathNameCreateTemporaryDir(). Clearly callers of these interfaces will need to be very careful about managing the names they use. Callers also own the problem of cleaning up files, since there is no automatic cleanup of files created this way. My intention was that these facilities would *only* be used by BufFileSet, since it has machinery to manage those things. 2. buffile.c introduces BufFileSet, which is conceptually a set of BufFiles that can be shared by multiple backends with DSM segment-scoped cleanup. It is implemented as a set of directories: one for each tablespace in temp_tablespaces. It controls the naming of those directories. The BufFileSet directories are named similarly to fd.c's traditional temporary file names using the usual recipe "pgsql_tmp" + PID + per-process counter but have an additional ".set" suffix. RemovePgTempFilesInDir() recognises directories with that prefix and suffix as junk left over from a crash when cleaning up. I suppose it's that knowledge about reserved name patterns and cleanup that you are thinking of as a central registry? As for the BufFiles that are in a BufFileSet, buffile.c has no opinion on that: the calling code (parallel CREATE INDEX, sharedtuplestore.c, ...) is responsible for coming up with its own scheme. If parallel CREATE INDEX wants to name shared BufFiles "walrus" and "banana", that's OK by me, and those files won't collide with anything in another BufFileSet because each BufFileSet has its own directory (-ies). One complaint about the current coding that someone might object to: MakeSharedSegmentPath() just dumps the caller's BufFile name into a path without sanitisation: I should fix that so that we only accept fairly limited strings here. Another complaint is that perhaps fd.c knows too much about buffile.c's business. For example, RemovePgTempFilesInDir() knows about the ".set" directories created by buffile.c, which might be called a layering violation. Perhaps the set/directory logic should move entirely into fd.c, so you'd call FileSetInit(FileSet *), not BufFileSetInit(BufFileSet *), and then BufFileOpenShared() would take a FileSet *, not a BufFileSet *. Thoughts? 3. sharedtuplestore.c takes a caller-supplied BufFileSet and creates its shared BufFiles in there. Earlier versions created and owned a BufFileSet, but in the current Parallel Hash patch I create loads of separate SharedTuplestore objects but I didn't want to create load of directories to back them, so you can give them all the same BufFileSet. That works because SharedTuplestores are also given a name, and it's the caller's job (in my case nodeHash.c) to make sure the SharedTuplestores are given unique names within the same BufFileSet. For Parallel Hash you'll see names like 'i3of8' (inner batch 3 of 8). There is no need for there to be in any sort of central registry for that though, because it rides on top of the guarantees from 2 above: buffile.c will put those files into a uniquely named directory, and that works as long as no one else is allowed to create files or directories in the temp directory that collide with its reserved pattern /^pgsql_tmp.+\.set$/. For the same reason, parallel CREATE INDEX is free to use worker numbers as BufFile names, since it has its own BufFileSet to work within. > * What's this all about?: > > + /* Accessor for the SharedBufFileSet that is at the end of Sharedsort. */ > + #define GetSharedBufFileSet(shared) \ > + ((BufFileSet *) (&(shared)->tapes[(shared)->nTapes])) In an earlier version, BufFileSet was one of those annoying data structures with a FLEXIBLE_ARRAY_MEMBER that you'd use as an incomplete type (declared but not defined in the includable header), and here it was being used "inside" (or rather after) SharedSort, which *itself* had a FLEXIBLE_ARRAY_MEMBER. The reason for the variable sized object was that I needed all backends to agree on the set of temporary tablespace OIDs, of which there could be any number, but I also needed a 'flat' (pointer-free) object I could stick in relocatable shared memory. In the newest version I changed that flexible array to tablespaces[8], because 8 should be enough tablespaces for anyone (TM). I don't really believe anyone uses temp_tablespaces for IO load balancing anymore and I hate code like the above. So I think Rushabh should now remove the above-quoted code and just use a BufFileSet directly as a member of SharedSort. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers