On Tue, Dec 26, 2017 at 11:23 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:
> On Tue, Dec 26, 2017 at 8:31 AM, Alvaro Herrera <alvhe...@alvh.no-ip.org> > wrote: > >> Jeff Janes wrote: >> > c3d09b3bd23f5f6 fixed it so concurrent CIC would not deadlock (or at >> least >> > not as reliably as before) by dropping its own snapshot before waiting >> for >> > all the other ones to go away. >> > >> > With commit 8aa3e47510b969354ea02a, concurrent CREATE INDEX >> CONCURRENTLY on >> > different tables in the same database started deadlocking against each >> > other again quite reliably. >> > >> > I think the solution is simply to drop the catalog snapshot as well, as >> in >> > the attached. >> >> Thanks for the analysis -- it sounds reasonable to me. However, I'm >> wondering why you used the *Conditionally() version instead of plain >> InvalidateCatalogSnapshot(). > > > My thinking was that if there was for some reason another snapshot hanging > around, that dropping the catalog snapshot unconditionally would be a > correctness bug, while doing it conditionally would just fail to avoid a > theoretically avoidable deadlock. So it seemed safer. > > >> I think they must have the same effect in >> practice (the assumption being that you can't run CIC in a transaction >> that may have other snapshots) but the theory seems simpler when calling >> the other routine: just do away with the snapshot always, period. >> > > That is probably true. But I never even knew that catalog snapshots > existed until yesterday, so didn't want to make make assumptions about what > else might exist, to avoid introducing new bugs similar to the one that > 8aa3e47510b969354ea02a fixed. > > >> >> This is back-patchable to 9.4, first branch which has MVCC catalog >> scans. It's strange that this has gone undetected for so long. > > > Since the purpose of CIC is to build an index with minimal impact on other > users, I think wanting to use it in concurrent cases might be rather rare. > In a maintenance window, I wouldn't want to use CIC because it is slower > and I'd rather just hold the stronger lock and do it fast, and as a hot-fix > outside a maintenance window I usually wouldn't want to hog the CPU with > concurrent builds when I could do them sequentially instead. Also, since > deadlocks are "expected" errors rather than "should never happen" errors, > and since the docs don't promise that you can do parallel CIC without > deadlocks, many people would probably shrug it off (which I initially did) > rather than report it as a bug. I was looking into it as an enhancement > rather than a bug until I discovered that it was already enhanced and then > undone. > > Cheers, > > Jeff > I was able to get this compiled, and ran the test before on stock 9.6.6, then on this patched version. I indeed reproduced it on 9.6.6, but on the patched version, it indeed fixes my issue. Let me know if I can be of further help. Thanks, Jeremy