On Sat, Jul 27, 2019 at 6:22 AM Chengchao Yu <chen...@microsoft.com> wrote: > > Thus, I have updated the patch v3 according to your suggestions. Could you > help to review again? > Please let me know should you have more suggestions or feedbacks. >
I have tried to look into this patch and I don't think it fixes the problem. Basically, I have tried the commands suggested by you in single-user mode, create table; insert and then checkpoint. Now, what I see is almost the same behavior as explained by you in one of the above emails with a slight difference which makes me think that the fix you are proposing is not correct. Below is what you told: "The second type is in Step #4. At the time when “checkpoint” SQL command is being executed, PG has already set up the before_shmem_exit callbackShutdownPostgres(), which releases all lw-locks given transaction or sub-transaction is on-going. So after the first IO error, the buffer page’s lw-lock gets released successfully. However, later ShutdownXLOG() is invoked, and PG tries to flush buffer pages again, which results in the second IO error. Different from the first time, this time, all the previous executed before/on_shmem_exit callbacks are not invoked again due to the decrease of the indexes. So lw-locks for buffer pages are not released when PG tries to get the same buffer lock in AbortBufferIO(), and then PG process gets stuck." The only difference is in the last line where for me it gives assertion failure when trying to do ReleaseAuxProcessResources. Below is the callstack: postgres.exe!ExceptionalCondition(const char * conditionName=0x00db0c78, const char * errorType=0x00db0c68, const char * fileName=0x00db0c18, int lineNumber=1722) Line 55 C postgres.exe!UnpinBuffer(BufferDesc * buf=0x052a104c, bool fixOwner=true) Line 1722 + 0x2f bytes C postgres.exe!ReleaseBuffer(int buffer=96) Line 3367 + 0x17 bytes C postgres.exe!ResourceOwnerReleaseInternal(ResourceOwnerData * owner=0x0141f6e8, <unnamed-enum-RESOURCE_RELEASE_BEFORE_LOCKS> phase=RESOURCE_RELEASE_BEFORE_LOCKS, bool isCommit=false, bool isTopLevel=true) Line 526 + 0x9 bytes C postgres.exe!ResourceOwnerRelease(ResourceOwnerData * owner=0x0141f6e8, <unnamed-enum-RESOURCE_RELEASE_BEFORE_LOCKS> phase=RESOURCE_RELEASE_BEFORE_LOCKS, bool isCommit=false, bool isTopLevel=true) Line 484 + 0x17 bytes C postgres.exe!ReleaseAuxProcessResources(bool isCommit=false) Line 861 + 0x15 bytes C > postgres.exe!ReleaseAuxProcessResourcesCallback(int code=1, unsigned int > arg=0) Line 881 + 0xa bytes C postgres.exe!shmem_exit(int code=1) Line 272 + 0x1f bytes C postgres.exe!proc_exit_prepare(int code=1) Line 194 + 0x9 bytes C postgres.exe!proc_exit(int code=1) Line 107 + 0x9 bytes C postgres.exe!errfinish(int dummy=0, ...) Line 538 + 0x7 bytes C postgres.exe!mdwrite(SMgrRelationData * reln=0x0147e140, ForkNumber forknum=MAIN_FORKNUM, unsigned int blocknum=7, char * buffer=0x0542dd00, bool skipFsync=false) Line 713 + 0x4c bytes C postgres.exe!smgrwrite(SMgrRelationData * reln=0x0147e140, ForkNumber forknum=MAIN_FORKNUM, unsigned int blocknum=7, char * buffer=0x0542dd00, bool skipFsync=false) Line 587 + 0x24 bytes C postgres.exe!FlushBuffer(BufferDesc * buf=0x052a104c, SMgrRelationData * reln=0x0147e140) Line 2759 + 0x1d bytes C postgres.exe!SyncOneBuffer(int buf_id=95, bool skip_recently_used=false, WritebackContext * wb_context=0x012ccea0) Line 2402 + 0xb bytes C postgres.exe!BufferSync(int flags=5) Line 1992 + 0x15 bytes C postgres.exe!CheckPointBuffers(int flags=5) Line 2586 + 0x9 bytes C postgres.exe!CheckPointGuts(unsigned __int64 checkPointRedo=22933176, int flags=5) Line 8991 + 0x9 bytes C postgres.exe!CreateCheckPoint(int flags=5) Line 8780 + 0x11 bytes C postgres.exe!ShutdownXLOG(int code=1, unsigned int arg=0) Line 8333 + 0x7 bytes C postgres.exe!shmem_exit(int code=1) Line 272 + 0x1f bytes C postgres.exe!proc_exit_prepare(int code=1) Line 194 + 0x9 bytes C postgres.exe!proc_exit(int code=1) Line 107 + 0x9 bytes C postgres.exe!errfinish(int dummy=0, ...) Line 538 + 0x7 bytes C postgres.exe!mdwrite(SMgrRelationData * reln=0x0147e140, ForkNumber forknum=MAIN_FORKNUM, unsigned int blocknum=7, char * buffer=0x0542dd00, bool skipFsync=false) Line 713 + 0x4c bytes C postgres.exe!smgrwrite(SMgrRelationData * reln=0x0147e140, ForkNumber forknum=MAIN_FORKNUM, unsigned int blocknum=7, char * buffer=0x0542dd00, bool skipFsync=false) Line 587 + 0x24 bytes C postgres.exe!FlushBuffer(BufferDesc * buf=0x052a104c, SMgrRelationData * reln=0x0147e140) Line 2759 + 0x1d bytes C postgres.exe!SyncOneBuffer(int buf_id=95, bool skip_recently_used=false, WritebackContext * wb_context=0x012ce580) Line 2402 + 0xb bytes C postgres.exe!BufferSync(int flags=44) Line 1992 + 0x15 bytes C postgres.exe!CheckPointBuffers(int flags=44) Line 2586 + 0x9 bytes C postgres.exe!CheckPointGuts(unsigned __int64 checkPointRedo=22933176, int flags=44) Line 8991 + 0x9 bytes C postgres.exe!CreateCheckPoint(int flags=44) Line 8780 + 0x11 bytes C postgres.exe!RequestCheckpoint(int flags=44) Line 967 + 0xc bytes C postgres.exe!standard_ProcessUtility(PlannedStmt * pstmt=0x0146b738, const char * queryString=0x0146ad98, <unnamed-enum-PROCESS_UTILITY_TOPLEVEL> context=PROCESS_UTILITY_TOPLEVEL, ParamListInfoData * params=0x00000000, QueryEnvironment * queryEnv=0x00000000, _DestReceiver * dest=0x00adc1d8, char * completionTag=0x012cfdbc) Line 769 + 0x28 bytes C It seems to me there are other things like ReleaseAuxProcessResources() before AbortBufferIO() which expects LWLocks to be released. I didn't get much time to further debug this, but I think some more analysis is required for this issue. I guess you didn't encounter this problem because you are not using Asserts enabled build, but there could be some other reason as well. I have marked this CF entry as "Waiting on Author". -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com