The following bug has been logged on the website: Bug reference: 7624 Logged by: Tianyin Xu Email address: t...@cs.ucsd.edu PostgreSQL version: 9.2.1 Operating system: Linux (actually doesn't matter) Description:
Hi, Postgresql, I'm here to report a very misleading log message and inconsistent configuration design. It confuses users, unnecessarily complicating configuration error diagnosis and causing frustration. As a computer system grad student familiar with C code and GDB, it took me ~1.8 hours to figure out the problem. -------------------------- So, in the postgresql.conf, I had the following entries: data_directory = 'local/pgsql/data/' hba_file = 'local/pgsql/data/pg_hba.conf' ident_file = 'local/pgsql/data/pg_ident.conf' unix_socket_directory = 'local/pgsql/data/' Without the “unix_socket_directory” setting, when I started the server, everything was fine. However, with the “unix_socket_directory” setting, the server refused to start with the following message: FATAL: could not create lock file "local/pgsql/data/.s.PGSQL.5432.lock": No such file or directory Since the sever definitely could find “data_directory”, “hba_file”, and “ident_file”, the “local/pgsql/data” was definitely considered as ***valid*** path. The weird thing to me is that typing “#touch local/pgsql/data/.s.PGSQL.5432.lock” could generate the file, and even I generated the file, the server still complained with the same log message. Then, I used GDB to trace the error point and found the message was printed because of the failure of “fopen”, /* src/backend/utils/init/miscinit.c */ static void CreateLockFile(const char *filename, bool amPostmaster, bool isDDLock, const char *refName) { …. fd = open(filename, O_RDWR | O_CREAT | O_EXCL, 0600); if ((errno != EEXIST && errno != EACCES) || ntries > 100) ereport(FATAL, (errcode_for_file_access(), errmsg("could not create lock file \"%s\": %m", filename))); …. } I copied this piece of code out into a test.c file, compiled, and ran. It did generate the “local/pgsql/data/.s.PGSQL.5432.lock”!!!, which means the function “open” definitely works. Then, I print the errno and found it's 2, i.e., ENOENT: O_CREAT is not set and the named file does not exist; or O_CREAT is set and either the path prefix does not exist or the path argument points to an empty string. But I thought (I was stupid) this's bullshit. Because path was not an empty string, and it definitely existed! Otherwise how could “data_directory”, “hba_file”, “ident_file” and the other similar file/directory paths work? No idea... I installed the newest postgres-9.2.1 again, but nothing changed. I suspected it's because of some weird settings on my machine, so I installed postgres again on a new server machine, but still nothing changed! No idea... Found a senior student and explained this problem. He tried the same thing as I did and also had no idea. What the hell... Go back to the code, trace more carefully (patient, patient, patient), and finally found the problem!!! On the execution path, the server first called: /*src/backend/postmaster/postmaster.c*/ 757 ChangeToDataDir(); And then, called /*src/backend/utils/init/miscinit.c*/ 965 CreateLockFile(lockfile, amPostmaster, false, socketfile); So, when it called open(1), the working directory is changed to the data directory! Actually the errno makes sense. I was stupid!! ----------------------------- But how can users know this without reading the code... The only interface exposed to users are the configuration file which is not consistent at all. And the log message does not help but mislead users (the directory **does** exist...). I really suggest to make the configuration file consistent in this case. But I understand it might not be easy. But at least I think we should print a better log message which pinpoints to the absolute path like FATAL: 58P01: could not create lock file "/home/tianyin/postgresql-9.2.1/local/pgsql/data/local/pgsql/data/.s.PGSQL.5432.lock": No such file or directory In this case, the users can definitely figure out the problem by themselves. Here is the patch: diff --git a/miscinit_old.c b/miscinit.c index fb376a0..8df83a8 100644 --- a/miscinit_old.c +++ b/miscinit.c @@ -736,11 +736,13 @@ CreateLockFile(const char *filename, bool amPostmaster, /* * Couldn't create the pid file. Probably it already exists. */ - if ((errno != EEXIST && errno != EACCES) || ntries > 100) + if ((errno != EEXIST && errno != EACCES) || ntries > 100) { + char* abs_filename = make_absolute_path(filename); ereport(FATAL, (errcode_for_file_access(), errmsg("could not create lock file \"%s\": %m", - filename))); + abs_filename))); + } /* * Read the file to get the old owner's PID. Note race condition Thanks a lot! Tianyin -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs