Hi, On 2023-01-11 22:39:49 -0600, Justin Pryzby wrote: > Thomas raised a good question, which was how the tests were passing when > SnapBuildSerialize() was raising an error, which is what it would've > been doing when I used data_sync_retry=no.
Presumably some test not checking for failures in a part of the test. > So .. why is wal_sync_method being used to control fsync for things > other than WAL? Historical raisins, I think. The problem is that macOS lies about fsync, and one needs special magic to make it behave like a real fsync. Somebody thought that instead of inventing a separate GUC to control whether the "real" fsync is used for other subsystems, it'd be better to reuse wal_sync_method. Note that this isn't the function that is actually used for WAL (that's issue_xlog_fsync()), and that pg_fsync() only uses the GUC to know whether to use pg_fsync_writethrough() or pg_fsync_no_writethrough(fd). > See 6dc7760ac (c. 2005) which added wal_fsync_writethrough, at which > point (since 9b178555f, c. 2004) wal_sync_method was already being used > for SLOG. > > Now, it's also being used for logical decoding (since b89e1510 and > 858ec1185, c. 2014) in rewriteheap.c/snapbuild.c. And pidfiles (since > ee0e525bf, 2010). And the control file (8b938d36f7, 2019). Note that > data_sync_retry wasn't added until 9ccdd7f66 (c. 2018) > > It looks like logical decoding may be the "most wrong" place that > wal_sync_method is being used, so maybe my change is reasonable to > consider, and not just a workaround. I don't follow. What does using fsync_fname() vs fsync_fname_ext() have to do with pg_fsync() using wal_sync_method? fsync_fname() is just a wrapper around fsync_fname_ext(). Both end up in pg_fsync(). Are you actually proposing that we don't PANIC after an fsync for the category of files that you list here, even with data_sync_retry set? Greetings, Andres Freund