Re: New strategies for freezing, advancing relfrozenxid early

Andres Freund Fri, 27 Jan 2023 00:52:18 -0800

Hi,

On 2023-01-26 23:11:41 -0800, Peter Geoghegan wrote:
> > Essentially the "any fpi" logic is a very coarse grained way of using the 
> > page
> > LSN as a measurement. As I said, I don't think "has a checkpoint occurred
> > since the last write" is a good metric to avoid unnecessary freezing - it's
> > too coarse. But I think using the LSN is the right thought. What about
> > something like
> >
> >   lsn_threshold =  insert_lsn - (insert_lsn - lsn_of_last_vacuum) * 0.1
> >   if (/* other conds */ && PageGetLSN(page) <= lsn_threshold)
> >      FreezeMe();
> >
> > I probably got some details wrong, what I am going for with lsn_threshold is
> > that we'd freeze an already dirty page if it's not been updated within 10% 
> > of
> > the LSN distance to the last VACUUM.
> 
> It seems to me that you're reinventing something akin to eager
> freezing strategy here. At least that's how I define it, since now
> you're bringing the high level context into it; what happens with the
> table, with VACUUM operations, and so on. Obviously this requires
> tracking the metadata that you suppose will be available in some way
> or other, in particular things like lsn_of_last_vacuum.


I agree with bringing high-level context into the decision about whether to
freeze agressively - my problem with the eager freezing strategy patch isn't
that it did that too much, it's that it didn't do it enough.


But I also don't think what I describe above is really comparable to "table
level" eager freezing though - the potential worst case overhead is a small
fraction of the WAL volume, and there's zero increase in data write volume. I
suspect the absolute worst case of "always freeze dirty pages" is when a
single tuple on the page gets updated immediately after every time we freeze
the page - a single tuple is where the freeze record is the least space
efficient. The smallest update is about the same size as the smallest freeze
record.  For that to amount to a large WAL increase you'd a crazy rate of such
updates interspersed with vacuums. In slightly more realistic cases (i.e. not
column less tuples that constantly get updated and freezing happening all the
time) you end up with a reasonably small WAL rate overhead.

That worst case of "freeze dirty" is bad enough to spend some brain and
compute cycles to prevent. But if we don't always get it right in some
workload, it's not *awful*.


The worst case of the "eager freeze strategy" is a lot larger - it's probably
something like updating one narrow tuple every page, once per checkpoint, so
that each freeze generates an FPI. I think that results in a max overhead of
2x for data writes, and about 150x for WAL volume (ratio of one update record
with an FPI). Obviously that's a pointless workload, but I do think that
analyzing the "outer boundaries" of the regression something can cause, can be
helpful.


I think one way forward with the eager strategy approach would be to have a
very narrow gating condition for now, and then incrementally expand it in
later releases.

One use-case where the eager strategy is particularly useful is
[nearly-]append-only tables - and it's also the one workload that's reasonably
easy to detect using stats. Maybe something like
(dead_tuples_since_last_vacuum / inserts_since_last_vacuum) < 0.05
or so.

That'll definitely leave out loads of workloads where eager freezing would be
useful - but are there semi-reasonable workloads where it'll hurt badly? I
don't *think* so.


> What about unlogged/temporary tables? The obvious thing to do there is
> what I did in the patch that was reverted (freeze whenever the page
> will thereby become all-frozen), and forget about LSNs. But you have
> already objected to that part, specifically.

My main concern about that is the data write amplification it could cause when
page is clean when we start freezing.  But I can't see a large potential
downside to always freezing unlogged/temp tables when the page is already
dirty.


> BTW, you still haven't changed the fact that you get rather different
> behavior with checksums/wal_log_hints. I think that that's good, but
> you didn't seem to.

I think that, if we had something like the recency test I was talking about,
we could afford to alway freeze when the page is already dirty and not very
recently modified. I.e. not even insist on a WAL record having been generated
during pruning/HTSV.  But I need to think through the dangers of that more.

Greetings,

Andres Freund

Re: New strategies for freezing, advancing relfrozenxid early

Reply via email to