On Thu, Feb 22, 2018 at 7:54 AM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote:> > The best solution I have come up with so far is to add a reference > count to SERIALIZABLEXACT. I toyed with putting the refcount into the > DSM instead, but then I ran into problems making that work when you > have a query with multiple Gather nodes. Since the refcount is in > SERIALIZABLEXACT I also had to add a generation counter so that I > could detect the case where you try to attach too late (the leader has > already errored out, the refcount has reached 0 and the > SERIALIZABLEXACT object has been recycled).
I don't know whether that's safe or not. It certainly sounds like it's solving one category of problem, but is that the only issue? If some backends haven't noticed that we're safe, they might keep acquiring SIREAD locks or doing other manipulations of shared state, which maybe could cause confusion. I haven't looked into this deeply enough to understand whether there's actually a possibility of trouble there, but I can't rule it out off-hand. One approach is to just disable this optimization for parallel query. Being able to use SERIALIZABLE with parallel query is better than not being able to do it, even if some optimizations are not applied in that case. Of course making the optimizations work is better, but we've got to be sure we're doing it right. > PS I noticed that for BecomeLockGroupMember() we say "If we can't > join the lock group, the leader has gone away, so just exit quietly" > but for various other similar things we spew errors (most commonly > seen one being "ERROR: could not map dynamic shared memory segment"). > Intentional? I suppose I thought that if we failed to map the dynamic shared memory segment, it might be down to any one of several causes; whereas if we fail to join the lock group, it must be because the leader has already exited. There might be a flaw in that thinking, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company