Hi Tom, On 12/4/17 3:15 PM, Tom Lane wrote: > While working through Michael Paquier's patch to clean up inconsistent > usage of AllocateDir(), I noticed that ResetUnloggedRelations and its > subroutines are not consistent about whether a directory open failure > results in erroring out or just emitting a LOG message and continuing. > ResetUnloggedRelations itself throws a hard error if it fails to open > pg_tblspc, but all the rest of reinit.c thinks a LOG message is > sufficient.
By a strange coincidence I spent a while today reading through this code... > My first thought was to change ResetUnloggedRelations to match the > rest, but on reflection I'm less sure about that. What we've got > at the moment is that a possibly-transient directory open failure > can result in failure to reset an unlogged relation to empty, > which to me amounts to data corruption. I'm wondering how this transient directory open failure is going to happen without a bunch of other things going wrong, but I agree that if it happens then corruption would be the likely result. > If the contents of the > unlogged relation are inconsistent, which is plenty likely after > a crash, we could end up crashing later because of that; and in > any case the user would not see what they expect in the tables. Agreed. > So now I'm thinking we should do the reverse and change these functions > to give a hard error on AllocateDir failure. That would result in > startup-process failure if we are unable to scan the database, which is > not great, but there's certainly something badly wrong if we can't. +1. If a tablespace or database directory cannot be opened then I don't think it makes any sense to continue. Regards, -- -David da...@pgmasters.net