Hi, While discussing freezing tuples during CTAS[1], we found that heap_insert() with HEAP_INSERT_FROZEN brings performance degradation. For instance, with Paul's patch that sets HEAP_INSERT_FROZEN to CTAS, it took 12 sec whereas the code without the patch took 10 sec with the following query:
create table t1 (a, b, c, d) as select i,i,i,i from generate_series(1,20000000) i; I've done a simple benchmark of REFRESH MATERIALIZED VIEW with the following queries: create table source as select generate_series(1, 50000000); create materialized view mv as select * from source; refresh materialized view mv; The execution time of REFRESH MATERIALIZED VIEW are: w/ HEAP_INSERT_FROZEN flag : 42 sec w/o HEAP_INSERT_FROZEN flag : 33 sec After investigation, I found that such performance degradation happens on only HEAD code. It seems to me that commit 39b66a91b (and 7db0cd2145) is relevant that has heap_insert() set VM bits and PD_ALL_VISIBLE if HEAP_INSERT_FROZEN is specified (so CCing Tomas Vondra and authors). Since heap_insert() sets PD_ALL_VISIBLE to the page when inserting a tuple for the first time on the page (around L2133 in heapam.c), every subsequent heap_insert() on the page reads and pins a VM buffer (see RelationGetBufferForTuple()). Reading and pinning a VM buffer for every insertion is a very high cost. This doesn't happen in heap_multi_insert() since it sets VM buffer after filling the heap page with tuples. Therefore, there is no such performance degradation between COPY and COPY FREEZE if they use heap_multi_insert() (i.g., CIM_MULTI). Paul also reported it in that thread. As far as I read the thread and commit messages related to those commits, they are intended to COPY FREEZE and I could not find any discussion and mention about REFRESH MATERIALIZED VIEW. So I'm concerned we didn't expect such performance degradation. Setting VM bits and PD_ALL_VISIBLE at REFRESH MATERIALIZED VIEW would be a good choice in some cases. Since materialized views are read-only VM bits never be cleared after creation. So it might make sense for users to pay a cost to set them at refresh (note that CREATE MATERIALIZED VIEW doesn’t set VM bits since it’s internally treated as CTAS). On the other hand, given this big performance degradation (about 20%) users might want to rely on autovacuum so that VM bits are set in the background. However, unlike COPY, there is no way to disable freezing tuples for REFRESH MATERIALIZED VIEW. So every user would be imposed on those costs and affected by that performance degradation. I’m concerned that it could be a problem. What do you think? Regards, [1] https://www.postgresql.org/message-id/flat/FB1F5E2D-CBD1-4645-B74C-E0A1BFAE4AC8%40vmware.com -- Masahiko Sawada EDB: https://www.enterprisedb.com/