Hi Thomas I found some other problems which I want to share my change with you to make you confirm. <1> I changed the code in smgr_alloc_sr to avoid dead lock. LWLockAcquire(mapping_lock, LW_EXCLUSIVE); flags = smgr_lock_sr(sr); Assert(flags & SR_SYNCING(forknum)); + flags &= ~SR_SYNCING(forknum); if (flags & SR_JUST_DIRTIED(forknum)) { /* * Someone else dirtied it while we were syncing, so we can't mark * it clean. Let's give up on this SR and go around again. */ smgr_unlock_sr(sr, flags); LWLockRelease(mapping_lock); goto retry; } /* This fork is clean! */ - flags &= ~SR_SYNCING(forknum); flags &= ~SR_DIRTY(forknum); }
In smgr_drop_sr, if the sr is SR_SYNCING, it will retry until the sr is not SR_SYNCING. But in smgr_alloc_sr, if the sr is SR_JUST_DIRTIED, it will retry to get another sr, although it has been synced by smgrimmedsync, the flag SR_SYNCING doesn't changed. This might cause dead lock. So I changed your patch as above. <2> I changed the code in smgr_drop_sr to avoid some corner cases /* Mark it invalid and drop the mapping. */ smgr_unlock_sr(sr, ~SR_VALID); + for (forknum = 0; forknum <= MAX_FORKNUM; forknum++) + sr->nblocks[forknum] = InvalidBlockNumber; hash_search_with_hash_value(sr_mapping_table, &reln->smgr_rnode, hash, HASH_REMOVE, NULL); smgr_drop_sr just removes the hash entry from sr_mapping_table, but doesn't remove the sr_pool. But in smgrnblocks_fast, it just get sr from sr_pool, so I add some codes as above to avoid some corner cases get an unexpected result from smgrnblocks_fast. Is it necessary, I also want some advice from you. Thanks a lot. Buzhen ------------------原始邮件 ------------------ 发件人:Thomas Munro <thomas.mu...@gmail.com> 发送时间:Tue Dec 29 22:52:59 2020 收件人:陈佳昕(步真) <buzhen....@alibaba-inc.com> 抄送:Amit Kapila <amit.kapil...@gmail.com>, Konstantin Knizhnik <k.knizh...@postgrespro.ru>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org> 主题:Re: Re: Cache relation sizes? On Wed, Dec 23, 2020 at 1:31 AM 陈佳昕(步真) <buzhen....@alibaba-inc.com> wrote: > I studied your patch these days and found there might be a problem. > When execute 'drop database', the smgr shared pool will not be removed > because of no call 'smgr_drop_sr'. Function 'dropdb' in dbcommands.c remove > the buffer from bufferpool and unlink the real files by 'rmtree', It doesn't > call smgrdounlinkall, so the smgr shared cache will not be dropped although > the table has been removed. This will cause some errors when smgr_alloc_str > -> smgropen、smgrimmedsync. Table file has been removed, so smgropen and > smgrimmedsync will get a unexpected result. Hi Buzhen, Thanks, you're right -- it needs to scan the pool of SRs and forget everything from the database you're dropping.