On Wed, Nov 20, 2019 at 03:05:46PM +0900, Kyotaro Horiguchi wrote: > By the way, before finalize this, I'd like to share the result of a > brief benchmarking.
What non-default settings did you use? Please give the output of this or a similar command: select name, setting from pg_settings where setting <> boot_val; If you run more benchmarks and weren't already using wal_buffers=16MB, I recommend using it. > With 10 pgbench sessions. > pages SYNC WAL > 1: 915 ms 301 ms > 3: 1634 ms 508 ms > 5: 1634 ms 293ms > 10: 1671 ms 1043 ms > 17: 1600 ms 333 ms > 31: 1864 ms 314 ms > 56: 1562 ms 448 ms > 100: 1538 ms 394 ms > 177: 1697 ms 1047 ms > 316: 3074 ms 1788 ms > 562: 3306 ms 1245 ms > 1000: 3440 ms 2182 ms > 1778: 5064 ms 6464 ms # WAL's slope becomes steep > 3162: 8675 ms 8165 ms For picking a default wal_skip_threshold, it would have been more informative to see how this changes pgbench latency statistics. Some people want DDL to be fast, but more people want DDL not to reduce the performance of concurrent non-DDL. This benchmark procedure may help: 1. Determine $DDL_COUNT, a number of DDL transactions that take about one minute when done via syncs. 2. Start "pgbench -rP1 --progress-timestamp -T180 -c10 -j10". 3. Wait 10s. 4. Start one DDL backend that runs $DDL_COUNT transactions. 5. Save DDL start timestamp, DDL end timestamp, and pgbench output. I would compare pgbench tps and latency between the seconds when DDL is and is not running. As you did in earlier tests, I would repeat it using various page counts, with and without sync. On Wed, Nov 20, 2019 at 05:31:43PM +0900, Kyotaro Horiguchi wrote: > +Prefer to do the same in future access methods. However, two other > approaches > +can work. First, an access method can irreversibly transition a given fork > +from WAL-skipping to WAL-writing by calling FlushRelationBuffers() and > +smgrimmedsync(). Second, an access method can opt to write WAL > +unconditionally for permanent relations. When using the second method, do > not > +call RelationCopyStorage(), which skips WAL. > > Even using these methods, TransactionCommit flushes out buffers then > sync files again. Isn't a description something like the following > needed? > > === > Even an access method switched a in-transaction created relfilenode to > WAL-writing, Commit(Prepare)Transaction flushed all buffers for the > file then smgrimmedsync() the file. > === It is enough that the text says to prefer the approach that core access methods use. The extra flush and sync when using a non-preferred approach wastes some performance, but it is otherwise harmless. > + rel1 = relation_open(r1, AccessExclusiveLock); > + RelationAssumeNewRelfilenode(rel1); > > It cannot be accessed from other sessions. Theoretically it doesn't > need a lock but NoLock cannot be used there since there's a path that > doesn't take lock on the relation. But AEL seems too strong and it > causes unecessary side effect. Couldn't we use weaker locks? We could use NoLock. I assumed we already hold AccessExclusiveLock, in which case this has no side effects. On Thu, Nov 21, 2019 at 04:01:07PM +0900, Kyotaro Horiguchi wrote: > At Sun, 17 Nov 2019 20:54:34 -0800, Noah Misch <n...@leadboat.com> wrote in > > === Defect 1: gistGetFakeLSN() > > > > When I modified pg_regress.c to use wal_level=minimal for all suites, > > src/test/isolation/specs/predicate-gist.spec failed the assertion in > > gistGetFakeLSN(). One could reproduce the problem just by running this > > sequence in psql: > > > > begin; > > create table gist_point_tbl(id int4, p point); > > create index gist_pointidx on gist_point_tbl using gist(p); > > insert into gist_point_tbl (id, p) > > select g, point(g*10, g*10) from generate_series(1, 1000) g; > > > > I've included a wrong-in-general hack to make the test pass. I see two main > > options for fixing this: > > > > (a) Introduce an empty WAL record that reserves an LSN and has no other > > effect. Make GiST use that for permanent relations that are skipping WAL. > > Further optimizations are possible. For example, we could use a > > backend-local > > counter (like the one gistGetFakeLSN() uses for temp relations) until the > > counter is greater a recent real LSN. That optimization is probably too > > clever, though it would make the new WAL record almost never appear. > > > > (b) Exempt GiST from most WAL skipping. GiST index build could still skip > > WAL, but it would do its own smgrimmedsync() in addition to the one done at > > commit. Regular GiST mutations would test RELPERSISTENCE_PERMANENT instead > > of > > RelationNeedsWal(), and we'd need some hack for index_copy_data() and > > possibly > > other AM-independent code that skips WAL. > > > > Overall, I like the cleanliness of (a). The main argument for (b) is that > > it > > ensures we have all the features to opt-out of WAL skipping, which could be > > useful for out-of-tree index access methods. (I think we currently have the > > features for a tableam to do so, but not for an indexam to do so.) > > Overall, I > > lean toward (a). Any other ideas or preferences? > > I don't like (b), too. > > What we need there is any sequential numbers for page LSN but really > compatible with real LSN. Couldn't we use GetXLogInsertRecPtr() in the > case? No. If nothing is inserting WAL, GetXLogInsertRecPtr() does not increase. GiST pages need an increasing LSN value. I noticed an additional defect: BEGIN; CREATE TABLE t (c) AS SELECT 1; CHECKPOINT; -- write and fsync the table's one page TRUNCATE t; -- no WAL COMMIT; -- no FPI, just the commit record If we crash after the COMMIT and before the next fsync or OS-elected sync of the table's file, the table will stay on disk with its pre-TRUNCATE content.