Hi,
I was planning to do some review/testing on this patch, but then I
noticed it was rejected with feedback in 2015-07 and never resubmitted
into another CF. So I won't waste time in testing this unless someone
shouts that I should do that anyway. Instead I'll just post some ideas
about how we might improve the patch, because I'd forget about them
otherwise.
On 07/05/2015 09:48 AM, Heikki Linnakangas wrote:
The ideal correction formula f(x), would be such that f(g(X)) = X, where:
X is time, 0 = beginning of checkpoint, 1.0 = targeted end of
checkpoint (checkpoint_segments), and
g(X) is the amount of WAL generated. 0 = beginning of checkpoint, 1.0
= targeted end of checkpoint (derived from max_wal_size).
Unfortunately, we don't know the shape of g(X), as that depends on the
workload. It might be linear, if there is no effect at all from
full_page_writes. Or it could be a step-function, where every write
causes a full page write, until all pages have been touched, and after
that none do (something like an UPDATE without a where-clause might
cause that). In pgbench-like workloads, it's something like sqrt(x). I
picked X^1.5 as a reasonable guess. It's close enough to linear that it
shouldn't hurt too much if g(x) is linear. But it cuts the worst spike
at the very beginning, if g(x) is more like sqrt(x).
Exactly. I think the main "problem" here is that we do mix two types of
WAL records, with quite different characteristics:
(a) full_page_writes - very high volume right after checkpoint, then
usually drops to much lower volume
(b) regular records - about the same volume over time (well, lower
volume right after the checkpoint, as that's where FPWs happen)
We completely ignore this when computing elapsed_xlogs, because we
compute it (about) like this:
elapsed_xlogs = wal_since_checkpoint / CheckPointSegments;
which of course gets confused when we write a lot of WAL right after a
checkpoint, because of FPW. But what if we actually tracked the amount
of WAL produced by FWP in a checkpoint (which we current don't AFAIK)?
Then we could compute the expected *remaining* amount of WAL to be
produced within the checkpoint interval, and use that to compute a
better progress like this:
wal_bytes - WAL (total)
wal_fpw_bytes - WAL (due to FPW)
prev_wal_bytes - WAL (total) in previous checkpoint
prev_wal_fpw_bytes - WAL (due to FPW) in previous checkpoint
So we know that we should expect about
(prev_wal_bytes - wal_bytes) + (prev_wal_fpw_bytes - wal_fpw_bytes)
( regular WAL ) + ( FPW WAL )
to be produced until the end of the current checkpoint. I don't have a
clear idea how to transform this into the 'progress' yet, but I'm pretty
sure tracking the two types of WAL is a key to a better solution. The
x^1.5 is probably a step in the right direction, but I don't feel
particularly confident about the 1.5 (which is rather arbitrary).
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers