Hackers,

During stress testing on macOS of a table access method loaded from
a shared library, after rapidly killing and restarting the postgres server
multiple times, postgres can fail to load the shared library for the
table access method and fail.  This same problem can likely be hit
for other libraries, though that was not tried in the course of
debugging this problem.

The problem seems to stem from several factors.  First, when the
library is loaded in response to the postgresql.conf entry

  shared_preload_library = 'mytam'

the library name gets expanded to the full path /path/to/mytam.dylib
and that gets stored.  Later, when the access method handler is
called, it uses $libdir/mytam.  The $libdir/ prefix gets stripped in
load_external_function, and it searches for just "mytam".  The call
to expand_dynamic_library_name("mytam") tries to find it, but
pg_file_exists() can get a spurious failure from macOS despite the
file existing.  In that case, pg_file_exists() fails and
internal_load_libary("mytam") gets called without the full path,
leading to a strcmp for "mytam" to fail, because it doesn't match
"/path/to/mytam.dylib".  Then stat("mytam") gets called, which fails
with ENOENT.

The real bug here appears to be in macOS file system incorrectly
returning ENOENT, but this is highly reproducible for me, and we
should be able to harden postgres to handle it.

I propose that we modify internal_load_library() in dfmgr.c to have
a fallback when the matching fails to check whether the library was
already loaded during server startup.

Thoughts?



-- 

*Mark Dilger*

Attachment: v1-0001-Harden-internal_load_library-against-transient-er.patch
Description: Binary data

Reply via email to