Melanie Plageman <[email protected]> writes:
>
> Thanks for working on this, Renan.
>
Thanks for the feedback.
> It is quite a large patch set, which makes sense because we do not
> have histogram types in any of the shared memory stats right now. I
> think if we want to go the route of adding a histogram, we might want
> to make it general purpose enough to use for other types of stats. In
> fact there is nothing about your PruneXidHistogram struct which has to
> be related to prune xids.
>
True.
> Before doing that, I was wondering if there was a way to make
> incremental progress by adding and maintaining the oldest prune xid.
> It would mean adding code in the same places (heap_update/delete/etc).
> In heap_page_prune_and_freeze(), we would have to keep track of the
> oldest new prune xid given that we will probably be pruning away the
> previous oldest one.
In order to keep track of the oldest prune xid we need to keep some
info about all prune xid in the relation. Note that there can be more
than one page with same prune xid. So if we look only at a local prune
(which is the info that we have at those places), we cannot update the
global oldest prune xid, unless we know that it matches the current
oldest AND the current oldest occurs only once. Besides that, we cannot
say that the new prune xid of current page is the new oldest one
(probably it isn't).
Actually the main objective of maintaining this histogram is to keep
track of the relation oldest prunable xid in a concise manner. The
bounds of the first bin are the most important. The other bins are there
just to support this information in the future as the histogram gets
updated. The fact that we can eventually use this information in a
complex heuristic is just a bonus.
The information that we need to precisely keep track of the oldest prune
xid would be better modeled by a counter/multiset/bag containing
pages prune xid of a given relation. I discarded this approach
because it seemed hard to integrate such a complex data structure in
pg stat system (but that may be reviewed). The histogram approach
is actually an approximation of the info that would be represented by
this multiset. The advantage of this implementation is that it uses a
fixed size data structure that can be integrated in pg stat system.
>
> Then when vacuuming, we could consider skipping vacuuming the relation
> if the oldest prunable xid is newer than OldestXmin (unless we thought
> we'd be able to freeze tuples).
>
Then, if we really don't care about the additional information regarding
higher page prune xid, there is another way to keep track of the oldest
prunable xid of the relation: maintain a histogram of tuples xid
(max_xid, right?) instead of pages prune_xid. The advantage is that we
don't need to touch heap_{update,insert,delete} functions. In this
configuration, the penalty of maintaining the histogram would be
per-transaction and not per-page.
> I know your patch doesn't use the histograms now in vacuum decisions
> (only displays them). But maybe it is worth approaching from the
> opposite direction and use a coarser heuristic first and then make it
> more detailed.
Unfortunately, I don't see how to make this part even more simple. But your
suggestion of creating a simple heuristic based only on the oldest
prunable xid could fit in this patch, at least for evaluation
purposes. So from this point, the options are:
1. Go ahead with this, implementing the vacuum skip.
2. Switch to tuple-wise histogram to avoid performance issues. Then,
implement same vacuum skip.
3. Abandon this path and look for a more complete solution. We would use
a more complex data structure like a multiset or even an array
containing prune_xid's of each page. This data structure would not live
in pg stat, and would have a life cycle similar to a visibility map. The
detailed information would eventually allow the vacuum to skip pages
BEFORE fetching them. Well, I would need help...
Best regards,
Renan Fonseca