Thomas Munro <thomas.mu...@enterprisedb.com> writes: > Today I saw a one-off case of $SUBJECT, on macOS. I can't reproduce > it, but I noticed exactly the same thing on longfin the other day: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2018-11-25%2005%3A39%3A04
I trawled the buildfarm logs and discovered a second instance of exactly the same thing: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2018-11-19%2018%3A37%3A00 There have not been any other occurrences in the past 3 months, which is as far back as I went. (lorikeet has half a dozen occurrences of "could not stop postmaster", which is what I was grepping for, but they all are associated with that machine's intermittent postmaster crashes.) So that lets out the flaky-hardware theory: that occurrence is before longfin's hardware transplant. Also, I don't think I believe the OS-bug idea either, given that you saw it on 10.14.0. longfin's been running 10.14.something since 2018-09-26, and has accumulated circa 200 runs since then just on HEAD, never mind the back branches. It'd be pretty unlikely to see it only in the past week, and only on HEAD, if it were an OS bug introduced two months ago. So my theory is we broke something in HEAD a couple weeks ago. But what? The fsync changes you made are suspiciously close to this issue (ie one could explain it as written data not getting out), and were committed in the right time frame, but that change didn't affect writes to postmaster.pid did it? regards, tom lane