On Thu, 6 Oct 2022 at 12:44, Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Sat, Oct 1, 2022 at 12:35 AM Andres Freund <and...@anarazel.de> wrote: > > > > This issue does occasionally happen in CI, as e.g. noted in this thread: > > https://www.postgresql.org/message-id/20220930185345.GD6256%40telsasoft.com > > > > On 2022-08-18 15:17:47 +0530, Amit Kapila wrote: > > > I agree with you that getting rid of the clean-up lock on the new > > > bucket is a more invasive patch and should be done separately if > > > required. Yesterday, I have done a brief analysis and I think that is > > > possible but it doesn't seem to be a good idea to backpatch it. > > > > My problem with this approach is that the whole cleanup lock is hugely > > misleading as-is. As I noted in > > https://www.postgresql.org/message-id/20220817193032.z35vdjhpzkgldrd3%40awork3.anarazel.de > > we take the cleanup lock *after* re-initializing the page. Thereby > > completely breaking the properties that a cleanup lock normally tries to > > guarantee. > > > > Even if that were to achieve something useful (doubtful in this case), > > it'd need a huge comment explaining what's going on. > > > > Attached are two patches. The first patch is what Robert has proposed > with some changes in comments to emphasize the fact that cleanup lock > on the new bucket is just to be consistent with the old bucket page > locking as we are initializing it just before checking for cleanup > lock. In the second patch, I removed the acquisition of cleanup lock > on the new bucket page and changed the comments/README accordingly. > > I think we can backpatch the first patch and the second patch can be > just a HEAD-only patch. Does that sound reasonable to you?
Thanks for the patches. I have verified that the issue is fixed using a manual test upto REL_10_STABLE version and found it to be working fine. I have added code to print the old buffer and new buffer values when both old buffer and new buffer will get dirtied. Then I had executed the following test and note down the old buffer and new buffer value from the log file: CREATE TABLE pvactst (i INT, a INT[], p POINT) with (autovacuum_enabled = off); CREATE INDEX hash_pvactst ON pvactst USING hash (i); create table t1(c1 int); INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM generate_series(1,1000) i; INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM generate_series(1,1000) i; INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM generate_series(1,1000) i; In my environment, the issue will occur when oldbuf is 38 and newbuf is 60. Once we know the old buffer and new buffer values, we will have to debug the checkpointer and recovery process to simulate the scenario. I used the following steps to simulate the issue in my environment: 1) Create streaming replication setup with the following configurations: wal_consistency_checking = all shared_buffers = 128MB # min 128kB bgwriter_lru_maxpages = 0 # max buffers written/round, 0 disables checkpoint_timeout = 30s # range 30s-1d 2) Execute the following in master node: CREATE TABLE pvactst (i INT, a INT[], p POINT) with (autovacuum_enabled = off); CREATE INDEX hash_pvactst ON pvactst USING hash (i); 3) Hold checkpointer process of standby instance at BufferSync while debugging. 4) Execute the following in master node: create table t1(c1 int); -- This is required so that the old buffer value is not dirty in checkpoint process. (If old buffer is dirty then we will not be able to sync the new buffer as checkpointer will wait while trying to acquire the lock on old buffer). 5) Make checkpoint process to check the buffers up to old buffer + 1. In our case it should cross 38. 6) Hold recovery process at hash_xlog_split_allocate_page->IsBufferCleanupOK (approximately line hash_xlog.c:357) while executing the following for the last insert in the master node: INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM generate_series(1,1000) i; INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM generate_series(1,1000) i; INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM generate_series(1,1000) i; 7) Continue the checkpointer process and make it proceed to SyncOneBuffer with buf_id = 60(newbuf value that was noted from the earlier execution) and let it proceed up to PinBuffer_Locked(bufHdr); 8) Continue the recovery process will reproduce the PANIC scenario. Regards, Vignesh