Hi, On 2022-11-22 23:43:29 -0600, Justin Pryzby wrote: > I think there may be a problem/deficiency with hint bits: > > |postgres=# DROP TABLE u2; CREATE TABLE u2 AS SELECT > generate_series(1,999999)a; SELECT pg_stat_reset_shared('io'); explain > (analyze,buffers) SELECT * FROM u2; > |... > | Seq Scan on u2 (cost=0.00..15708.75 rows=1128375 width=4) (actual > time=0.111..458.239 rows=999999 loops=1) > | Buffers: shared hit=2048 read=2377 dirtied=2377 written=2345 > > |postgres=# SELECT COUNT(1), relname, COUNT(1) FILTER(WHERE isdirty) FROM > pg_buffercache b LEFT JOIN pg_class c ON > pg_relation_filenode(c.oid)=b.relfilenode GROUP BY 2 ORDER BY 1 DESC LIMIT 11; > | count | relname | count > |-------+---------------------------------+------- > | 13619 | | 0 > | 2080 | u2 | 2080 > | 104 | pg_attribute | 4 > | 71 | pg_statistic | 1 > | 51 | pg_class | 1 > > It says that SELECT caused 2377 buffers to be dirtied, of which 2080 are > associated with the new table in pg_buffercache.
Note that there's 2048 dirty buffers for u2 in shared_buffers before the SELECT, despite the relation being 4425 blocks long, due to the CTAS using BAS_BULKWRITE. > |postgres=# SELECT * FROM pg_stat_io WHERE > backend_type!~'autovac|archiver|logger|standalone|startup|^wal|background > worker' or true ORDER BY 2; > | backend_type | io_context | io_object | read | written | > extended | op_bytes | evicted | reused | files_synced | stats_reset > |... > | client backend | bulkread | relation | 2377 | 2345 | > | 8192 | 0 | 2345 | | 2022-11-22 22:32:33.044552-06 > > I think it's a known behavior that hint bits do not use the strategy > ring buffer. For BAS_BULKREAD, ring_size = 256kB (32, 8kB pages), but > there's 2080 dirty pages in the buffercache (~16MB). I don't think there's any "circumvention" of the ringbuffer here. There's 2048 buffers for u2 in s_b before, all dirty, there's 2080 after, also all dirty. So the ringbuffer restricted the increase in shared buffers used for u2 to 2080-2048=32 additional buffers. The reason hint bits don't prevent pages from being written out here is that a BAS_BULKREAD strategy doesn't cause all buffer writes to be rejected, it just causes buffer writes to be rejected when the page LSN would require a WAL flush. And that's not typically the case when you just set a hint bit, unless you use wal_log_hint_bits = true. If I turn on wal_log_hints=true and add a CHECKPOINT after the CTAS I see 0 reuses (and 4425 dirty buffers), which is what I'd expect. > But the IO view says that 2345 of the pages were "reused", which seems > misleading to me. Maybe that just follows from the behavior and the view is > fine. If the view is fine, maybe this case should still be specifically > mentioned in the docs. I think that's just confusing due to the reset. 2048 + 2345 = 4393, but we only have 2080 buffers for u2 in s_b. Greetings, Andres Freund