On Mon, Mar 23, 2020 at 3:24 PM Fujii Masao <masao.fu...@oss.nttdata.com> wrote: > > On 2020/03/23 21:01, Fujii Masao wrote: > > > > > > On 2020/03/23 7:32, Kirill Bychik wrote: > >>>>> I'm attaching a v5 with fp records only for temp tables, so there's no > >>>>> risk of > >>>>> instability. As I previously said I'm fine with your two patches, so > >>>>> unless > >>>>> you have objections on the fpi test for temp tables or the documentation > >>>>> changes, I believe those should be ready for committer. > >>>> > >>>> You added the columns into pg_stat_database, but seem to forget to > >>>> update the document for pg_stat_database. > >>> > >>> Ah right, I totally missed that when I tried to clean up the original POC. > >>> > >>>> Is it really reasonable to add the columns for vacuum's WAL usage into > >>>> pg_stat_database? I'm not sure how much the information about > >>>> the amount of WAL generated by vacuum per database is useful. > >>> > >>> The amount per database isn't really useful, but I didn't had a better > >>> idea on > >>> how to expose (auto)vacuum WAL usage until this: > >>> > >>>> Isn't it better to make VACUUM VERBOSE and autovacuum log include > >>>> that information, instead, to see how much each vacuum activity > >>>> generates the WAL? Sorry if this discussion has already been done > >>>> upthread. > >>> > >>> That's a way better idea! I'm attaching the full patchset with the 3rd > >>> patch > >>> to use this approach instead. There's a bit a duplicate code for > >>> computing the > >>> WalUsage, as I didn't find a better way to avoid that without exposing > >>> WalUsageAccumDiff(). > >>> > >>> Autovacuum log sample: > >>> > >>> 2020-03-19 15:49:05.708 CET [5843] LOG: automatic vacuum of table > >>> "rjuju.public.t1": index scans: 0 > >>> pages: 0 removed, 2213 remain, 0 skipped due to pins, 0 skipped > >>> frozen > >>> tuples: 250000 removed, 250000 remain, 0 are dead but not yet > >>> removable, oldest xmin: 502 > >>> buffer usage: 4448 hits, 4 misses, 4 dirtied > >>> avg read rate: 0.160 MB/s, avg write rate: 0.160 MB/s > >>> system usage: CPU: user: 0.13 s, system: 0.00 s, elapsed: 0.19 s > >>> WAL usage: 6643 records, 4 full page records, 1402679 bytes > >>> > >>> VACUUM log sample: > >>> > >>> # vacuum VERBOSE t1; > >>> INFO: vacuuming "public.t1" > >>> INFO: "t1": removed 50000 row versions in 443 pages > >>> INFO: "t1": found 50000 removable, 0 nonremovable row versions in 443 > >>> out of 443 pages > >>> DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 512 > >>> There were 50000 unused item identifiers. > >>> Skipped 0 pages due to buffer pins, 0 frozen pages. > >>> 0 pages are entirely empty. > >>> 1332 WAL records, 4 WAL full page records, 306901 WAL bytes > >>> CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s. > >>> INFO: "t1": truncated 443 to 0 pages > >>> DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s > >>> INFO: vacuuming "pg_toast.pg_toast_16385" > >>> INFO: index "pg_toast_16385_index" now contains 0 row versions in 1 pages > >>> DETAIL: 0 index row versions were removed. > >>> 0 index pages have been deleted, 0 are currently reusable. > >>> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. > >>> INFO: "pg_toast_16385": found 0 removable, 0 nonremovable row versions > >>> in 0 out of 0 pages > >>> DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 513 > >>> There were 0 unused item identifiers. > >>> Skipped 0 pages due to buffer pins, 0 frozen pages. > >>> 0 pages are entirely empty. > >>> 0 WAL records, 0 WAL full page records, 0 WAL bytes > >>> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. > >>> VACUUM > >>> > >>> Note that the 3rd patch is an addition on top of Kirill's original patch, > >>> as > >>> this is information that would have been greatly helpful to investigate > >>> in some > >>> performance issues I had to investigate recently. I'd be happy to have > >>> it land > >>> into v13, but if that's controversial or too late I'm happy to postpone > >>> it to > >>> v14 if the infrastructure added in Kirill's patches can make it to v13. > >> > >> Dear all, can we please focus on getting the core patch committed? > >> Given the uncertainity regarding autovacuum stats, can we please get > >> parts 1 and 2 into the codebase, and think about exposing autovacuum > >> stats later? > > > > Here are the comments for 0001 patch. > > > > + /* > > + * Report a full page image constructed for the WAL record > > + */ > > + pgWalUsage.wal_fp_records++; > > > > Isn't it better to use "fpw" or "fpi" for the variable name rather than > > "fp" here? In other places, "fpw" and "fpi" are used for full page > > writes/image. > > > > ISTM that this counter could be incorrect if XLogInsertRecord() determines > > to > > calculate again whether FPI is necessary or not. No? IOW, this issue could > > happen if XLogInsert() calls XLogRecordAssemble() multiple times in > > its do-while loop. Isn't this problematic? > > > > + long wal_bytes; /* size of wal records produced */ > > > > Isn't it safer to use uint64 (i.e., XLogRecPtr) as the type of this variable > > rather than long? > > > > + shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space); > > > > bufusage_space should be walusage_space here? > > > > /* > > * Finish parallel execution. We wait for parallel workers to finish, and > > * accumulate their buffer usage. > > */ > > > > There are some comments mentioning buffer usage, in execParallel.c. > > For example, the top comment for ExecParallelFinish(), as the above. > > These should be updated. > > Here are the comments for 0002 patch. > > + OUT wal_write_bytes int8, > + OUT wal_write_records int8, > + OUT wal_write_fp_records int8 > > Isn't "write" part in the column names confusing because it's WAL > *generated* (not written) by the statement? > > +RETURNS SETOF record > +AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4' > +LANGUAGE C STRICT VOLATILE; > > PARALLEL SAFE should be specified? > > +/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */ > > ISTM it's good timing to have also pg_stat_statements--1.8.sql since > the definition of pg_stat_statements() is changed. Thought? > > +-- CHECKPOINT before WAL tests to ensure test stability > +CHECKPOINT; > > Is this true? I thought you added this because the number of FPI > should be larger than zero in the subsequent test. No? But there > seems no such test. I'm not excited about adding the test checking > the number of FPI because it looks fragile, though... > > +UPDATE pgss_test SET b = '333' WHERE a = 3 \; > +UPDATE pgss_test SET b = '444' WHERE a = 4 ; > > Could you tell me why several queries need to be run to test > the WAL usage? Isn't running a few query enough for the test purpase?
FTR I marked the commitfest entry as waiting on author. Kirill do you think you'll have time to address Fuji-san's review shortly? The end of the commitfest is approaching quite fast :(