Hi Hackers, We found a race condition in pg_mkdir_p(), here is a simple reproducer:
#!/bin/sh basedir=pgdata datadir=$basedir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z logdir=$basedir/logs n=2 rm -rf $basedir mkdir -p $logdir # init databases concurrently, they will all try to create the parent dirs for i in `seq $n`; do initdb -D $datadir/$i >$logdir/$i.log 2>&1 & done wait # there is a chance one of the initdb commands failed to create the datadir grep 'could not create directory' $logdir/* The logic in pg_mkdir_p() is as below: /* check for pre-existing directory */ if (stat(path, &sb) == 0) { if (!S_ISDIR(sb.st_mode)) { if (last) errno = EEXIST; else errno = ENOTDIR; retval = -1; break; } } else if (mkdir(path, last ? omode : S_IRWXU | S_IRWXG | S_IRWXO) < 0) { retval = -1; break; } This seems buggy as it first checks the existence of the dir and makes the dir if it does not exist yet, however when executing concurrently a possible race condition can be as below: A: does a/ exists? no B: does a/ exists? no A: try to create a/, succeed B: try to create a/, failed as it already exists To prevent the race condition we could mkdir() directly, if it returns -1 and errno is EEXIST then check whether it's really a dir with stat(). In fact this is what is done in the `mkdir -p` command: https://github.com/coreutils/gnulib/blob/b5a9fa677847081c9b4f26908272f122b15df8b9/lib/mkdir-p.c#L130-L164 By the way, some callers of pg_mkdir_p() checks for EEXIST explicitly, such as in pg_basebackup.c: if (pg_mkdir_p(statusdir, pg_dir_create_mode) != 0 && errno != EEXIST) { pg_log_error("could not create directory \"%s\": %m", statusdir); exit(1); } This is still wrong with current code logic, because when the statusdir is a file the errno is also EEXIST, but it can pass the check here. Even if we fix pg_mkdir_p() by following the `mkdir -p` way the errno check here is still wrong. Best Regards Ning