Simon Riggs wrote: > On Thu, 2009-06-25 at 12:55 +0000, Fujii Masao wrote: >> The following bug has been logged online: >> >> Bug reference: 4879 >> Logged by: Fujii Masao >> Email address: masao.fu...@gmail.com >> PostgreSQL version: 8.4dev >> Operating system: RHEL5.1 x86_64 >> Description: bgwriter fails to fsync the file in recovery mode >> Details: > > Looking at it now.
Thanks. >> I suspect that the cause of this error is the race condition between >> file deletion by startup process and fsync by bgwriter: TRUNCATE xlog >> record immediately deletes the corresponding file, while it might be >> scheduled to be fsynced by bgwriter. We should leave the actual file >> deletion to bgwriter instead of startup process, like normal mode? I think the real problem is this in mdunlink(): > /* Register request to unlink first segment later */ > if (!isRedo && forkNum == MAIN_FORKNUM) > register_unlink(rnode); When we replay the unlink of the relation, we don't te bgwriter about it. Normally we do, so bgwriter knows that if the fsync() fails with ENOENT, it's ok since the file was deleted. It's tempting to just remove the "!isRedo" condition, but then we have another problem: if bgwriter hasn't been started yet, and the shmem queue is full, we get stuck in register_unlink() trying to send the message and failing. In archive recovery, we always start bgwriter at the beginning of WAL replay. In crash recovery, we don't start bgwriter until the end of wAL replay. So we could change the "!isRedo" condition to "!InArchiveRecovery". It's not a very clean solution, but it's simple. Hmm, what happens when the startup process performs a write, and bgwriter is not running? Do the fsync requests queue up in the shmem queue until the end of recovery when bgwriter is launched? I guess I'll have to try it out... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs