On 12/05/2014 05:05 PM, Robert Haas wrote:
[ reviving a very old thread ]
On Tue, Feb 10, 2009 at 4:11 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
Alvaro Herrera <alvhe...@commandprompt.com> writes:
For example, maybe we could keep track of counts of snapshots removed
since the last xmin calculation, and only run this routine if the number
is different from zero (or some small positive integer).
I think most of the callers of SnapshotResetXmin already know they
removed something.
It might be interesting for FreeSnapshot or something nearby to note
whether the snapshot being killed has xmin = proc's xmin, and only do
the update calculation if so.
I still dislike the assumption that all resource owners are children of
a known owner. I suspect in fact that it's demonstrably wrong right
now, let alone in future (cf comments in PortalRun). If we're going to
do this then snapmgr.c needs to track the snapshots for itself. Of
course that's going to make the "is it worth it" question even more
pressing.
I've run into essentially the same problem Jeff originally complained
about with a large customer who has long-running transactions that
make extensive use of cursors. Cursors are opened and closed over
time but it is rare for the number open to reach exactly zero, so what
ends up happening is that the backend xmin does not advance. As you
can imagine, that doesn't work out well.
The approach I came up with initially was similar to Jeff's: start at
the topmost resource owner and traverse them all, visiting every
snapshot along the way. But then I found this thread and saw the
comment that this might be "demonstrably wrong" and referring to the
comments in PortalRun. Having reviewed those comments, which don't
seem to have changed much in the last five years, I can't understand
how they related to this issue. It's true that the TopTransaction
resource owner could get swapped out under us during an internal
commit, but why should SnapshotResetXmin() have to care? It just
traverses the one that is in effect at the time it gets called. The
only real danger I see here is that there could be *more than one*
toplevel resource owner. I wonder if we could solve that problem by
adding a registry of active toplevel resource owners, so that if we
have a forest rather than a tree we can still find everything.
I don't immediately see the problem either, but I have to say that
grovelling through all the resource owners seems ugly anyway. Resource
owners are not meant to be traversed like that. And there could be a lot
of them, and you have to visit every one of them. That could get
expensive if there are a lot of resource owners.
BTW, you could easily detect that you haven't seen all the registered
snapshots, after traversing the resource owner, as we keep the counter
of them. So you could just fall back to not advancing the xmin if it
happens.
The problem I see with having snapmgr.c track the snapshots for itself
is that it is mostly duplicating of bookkeeping which is already being
done. Since this problem doesn't affect the majority of users, it's
not desirable to add a lot of extra bookkeeping to cater to it - but
even if it did affect a lot of users, we still want it to be as cheap
as possible, and reusing the tracking that resource owners are already
doing seems like the way to get there.
I would prefer doing separate bookkeeping in snapmgr.c. It seems like it
wouldn't be difficult to do. It has to be cheap, but I don't see any
reason to believe that it'd be more expensive than traversing through
all resource owners. A binary heap or some other priority queue
implementation should work pretty well for this.
I think there are a couple of things we can do to make this cheaper.
Suppose we keep track of the oldest xmin of any registered snapshot
and the number of registered snapshots that have that particular xmin.
Every time we deregister a snapshot, we check whether it is one of the
ones with the minimum xmin; if it is, we decrement the count. When
the count reaches 0, we know that a traversal of all registered
snapshots is guaranteed to find a newer value to advertise in
MyProc->xmin; that way, we can arrange to do the work only when it's
likely to pay off. In most cases this won't happen until the last
snapshot is unregistered, because our snapshots normally form a stack,
with newer snapshots having been taken later. But if somebody
unregisters the oldest snapshot we'll immediately notice and
recalculate.
Yeah, that's a reasonable optimization. It's a reasonable optimization
even if you do the bookkeeping in snapmgr.c.
And that optimization won't save you in the cases where it doesn't
apply. For example, what if snapshots happen to form a queue, rather
than a stack:
DECLARE c1 CURSOR FOR ...;
DECLARE c2 CURSOR FOR ...;
DECLARE c3 CURSOR FOR ...;
...
DECLARE c1000 CURSOR FOR ...;
CLOSE c1;
CLOSE c2;
CLOSE c3;
...
CLOSE c1000;
It's not hard to imagine an application doing that. Sure, you could
rewrite it to close the cursors in different order, but on the face of
it that's not an unreasonable thing for an application to do. I think we
should avoid the O(n^2) behaviour in that case.
(But +1 for addressing this problem, with the extra bookkeeping in
snapmgr.c)
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers