Thank you so much Amit! I have created the patch below: https://commitfest.postgresql.org/22/2003/
Please let me know should you have more suggestions. Thank you! Best regards, -- Chengchao Yu Software Engineer | Microsoft | Azure Database for PostgreSQL https://azure.microsoft.com/en-us/services/postgresql/ -----Original Message----- From: Amit Kapila <amit.kapil...@gmail.com> Sent: Friday, February 1, 2019 6:58 PM To: Chengchao Yu <chen...@microsoft.com> Cc: Thomas Munro <thomas.mu...@enterprisedb.com>; Pg Hackers <pgsql-hack...@postgresql.org>; Prabhat Tripathi <pt...@microsoft.com>; Sunil Kamath <sunil.kam...@microsoft.com>; Michal Primke <mpri...@microsoft.com>; TEJA Mupparti <tejeswar.muppa...@microsoft.com> Subject: Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs On Sat, Feb 2, 2019 at 4:42 AM Chengchao Yu <chen...@microsoft.com> wrote: > > Hi Amit, Thomas, > > Thank you very much for your feedbacks! Apologizes but I just saw both > messages. > > > We generally reserve the space in a relation before attempting to write, so > > not sure how you are able to hit the disk full situation via mdwrite. If > > you see the description of the function, that also indicates same. > > Absolutely agree, this isn’t a PG issue. Issue manifest for us at Microsoft > due to our own storage layer which treat mdextend() actions as setting length > of the file only. We have a workaround, and any change isn’t needed for > Postgres. > > > I am not telling that mdwrite can never lead to error, but just trying to > > understand the issue you actually faced. I haven't read your proposed > > solution yet, let's first try to establish the problem you are facing. > > We see transient IO errors reading a block where PG instance gets dead-lock > in single user mode until we kill the instance. The stack trace below shows > the behavior which is when mdread() failed with buffer holding its lw-lock. > This happens because in single user mode there is no call back to release the > lock and when AbortBufferIO() tries to acquire the same lock again, it will > wait for the lock indefinitely. > I think you can register your patch for next CF [1] so that we don't forget about it. [1] - https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcommitfest.postgresql.org%2F22%2F&data=02%7C01%7Cchengyu%40microsoft.com%7Cfee132e6ec2843c2838a08d688ba3aef%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636846730778775307&sdata=lJ2LjRgo%2Bd6ViKqwJ040BPzicOTFtFO8NmmVft00yKY%3D&reserved=0 -- With Regards, Amit Kapila. EnterpriseDB: https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.enterprisedb.com&data=02%7C01%7Cchengyu%40microsoft.com%7Cfee132e6ec2843c2838a08d688ba3aef%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636846730778775307&sdata=nXcVn6B1fl6b5iiDKybl3zf0fXo22%2BrZ1Ne7v1%2FM5DE%3D&reserved=0
fix-deadlock.patch
Description: fix-deadlock.patch