On Tue, Aug 9, 2022 at 9:20 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > > Thomas Munro <thomas.mu...@gmail.com> writes: > > On Tue, Aug 9, 2022 at 3:27 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > >> I have not tried to analyze the error-handling properties of 0001, > >> but if it's being equally cavalier then it shouldn't be committed > >> either. > > > 0001 does introduce new errors, as mentioned in the commit message, in > > the form of elevel ERROR passed into get_dirent_type(), which might be > > thrown if your OS has no d_type and lstat() fails (also if you asked > > to follow symlinks, but in those cases errors were already thrown). > > But in those cases, it seems at least a little fishy that we ignored > > errors from the existing lstat(). > > Hmmm ... I'll grant that ignoring lstat errors altogether isn't great. > But should the replacement behavior be elog-LOG-and-press-on, > or elog-ERROR-and-fail-the-surrounding-operation? I'm not in any > hurry to believe that the latter is more appropriate without some > analysis of what the callers are doing. > > The bottom line here is that I'm distrustful of behavioral changes > introduced to simplify refactoring rather than to solve a live > problem.
+1. I agree with Tom not to change elog-LOG to elog-ERROR and fail the checkpoint operation. Because the checkpoint is more important than why a single snapshot file (out thousands or even million files) isn't removed at that moment. Also, I originally proposed to change elog-ERROR to elog-LOG in CheckPointLogicalRewriteHeap for unlink() failures for the same reason. -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/