On Tue, Nov 6, 2018 at 10:05 AM chenhj <chjis...@163.com> wrote: > I analyzed the btree block where lwlock deadlock occurred, as follows:
Thank you for doing this important work. You're using Postgres 10.2. While that version isn't current with all GIN bug fixes, it does have this important one: "Ensure that vacuum will always clean up the pending-insertions list of a GIN index (Masahiko Sawada)" Later GIN fixes seem unlikely to be relevant to your issue. I think that this is probably a genuine, new bug. > The ginInsertValue() function above gets the lwlock in the order described in > the README. > However, ginScanToDelete() depth-first scans the btree and gets the EXCLUSIVE > lock, which creates a deadlock. > Is the above statement correct? If so, deadlocks should easily happen. I have been suspicious of deadlock hazards in the code for some time -- particularly around pending list cleanup. I go into a lot of detail on my suspicions here: https://www.postgresql.org/message-id/flat/CAH2-WzmfUpRjWcUq3%2B9ijyum4AJ%2Bk-meGT8_HnxBMpKz1aNS-g%40mail.gmail.com#ea5af1088adfacb3d0ba88313be1a5e3 I note that your first deadlock involve the following kinds of backends: * ginInsertCleanup() calls from a regular backend, which will have a backend do things that VACUUM generally only gets to do, like call RecordFreeIndexPage(). * (auto)VACUUM processes. Your second/recovery deadlock involves: * Regular read-only queries. * Recovery code. Quite a lot of stuff is involved here! The code in this area is way too complicated, and I haven't thought about it in about a year, so it's hard for me to be sure of anything at the moment. My guess is that there is confusion about the type of page expected within one or more blocks (e.g. entry tree vs. pending list), due to a race condition in block deletion and/or recycling -- again, I've suspected something like this could happen for some time. The fact that you get a distinct deadlock during recovery is consistent with that theory. It's safe to say that promoting the asserts on gin page type into "can't happen" elog errors in places like scanPendingInsert() and ginInsertCleanup() would be a good start. Note that we already did similar Assert-elog(ERROR) promotion this for posting tree code, when similar bugs appeared way back in 2013. -- Peter Geoghegan