Jeff Janes <jeff.ja...@gmail.com> writes: > On Tue, Dec 26, 2017 at 8:31 AM, Alvaro Herrera < > alvhe...@alvh.no-ip.org> wrote: > > Jeff Janes wrote: > > c3d09b3bd23f5f6 fixed it so concurrent CIC would not deadlock > (or at least > > not as reliably as before) by dropping its own snapshot before > waiting for > > all the other ones to go away. > > > > With commit 8aa3e47510b969354ea02a, concurrent CREATE INDEX > CONCURRENTLY on > > different tables in the same database started deadlocking > against each > > other again quite reliably. > > > > I think the solution is simply to drop the catalog snapshot as > well, as in > > the attached. > > Thanks for the analysis -- it sounds reasonable to me. However, > I'm > wondering why you used the *Conditionally() version instead of > plain > InvalidateCatalogSnapshot(). > > > My thinking was that if there was for some reason another snapshot > hanging around, that dropping the catalog snapshot unconditionally > would be a correctness bug, while doing it conditionally would just > fail to avoid a theoretically avoidable deadlock. So it seemed > safer. > > > I think they must have the same effect in > practice (the assumption being that you can't run CIC in a > transaction > that may have other snapshots) but the theory seems simpler when > calling > the other routine: just do away with the snapshot always, period. > > > That is probably true. But I never even knew that catalog snapshots > existed until yesterday, so didn't want to make make assumptions > about what else might exist, to avoid introducing new bugs similar to > the one that 8aa3e47510b969354ea02a fixed. > > > > This is back-patchable to 9.4, first branch which has MVCC > catalog > scans. It's strange that this has gone undetected for so long. > > > Since the purpose of CIC is to build an index with minimal impact on > other users, I think wanting to use it in concurrent cases might be > rather rare. In a maintenance window, I wouldn't want to use CIC > because it is slower and I'd rather just hold the stronger lock and > do it fast, and as a hot-fix outside a maintenance window I usually > wouldn't want to hog the CPU with concurrent builds when I could do
Hmmm, given that most/all large sites lately are probably running on hw with dozens or perhaps hundreds of CPUs/threads, I can see DBAs not being too concerned about "hogging". > them sequentially instead. Also, since deadlocks are "expected" > errors rather than "should never happen" errors, and since the docs > don't promise that you can do parallel CIC without deadlocks, many > people would probably shrug it off (which I initially did) rather > than report it as a bug. I was looking into it as an enhancement > rather than a bug until I discovered that it was already enhanced and Agree such an edge case not a high priority to support for the above reasons but good to assuming no breakage in some other regard :-) > then undone. > > Cheers, > > Jeff > > > > -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consult...@comcast.net p: 312.241.7800