Hi,

I was planning to do some review/testing on this patch, but then I noticed it was rejected with feedback in 2015-07 and never resubmitted into another CF. So I won't waste time in testing this unless someone shouts that I should do that anyway. Instead I'll just post some ideas about how we might improve the patch, because I'd forget about them otherwise.

On 07/05/2015 09:48 AM, Heikki Linnakangas wrote:

The ideal correction formula f(x), would be such that f(g(X)) = X, where:

  X is time, 0 = beginning of checkpoint, 1.0 = targeted end of
checkpoint (checkpoint_segments), and

  g(X) is the amount of WAL generated. 0 = beginning of checkpoint, 1.0
= targeted end of checkpoint (derived from max_wal_size).

Unfortunately, we don't know the shape of g(X), as that depends on the
workload. It might be linear, if there is no effect at all from
full_page_writes. Or it could be a step-function, where every write
causes a full page write, until all pages have been touched, and after
that none do (something like an UPDATE without a where-clause might
cause that). In pgbench-like workloads, it's something like sqrt(x). I
picked X^1.5 as a reasonable guess. It's close enough to linear that it
shouldn't hurt too much if g(x) is linear. But it cuts the worst spike
at the very beginning, if g(x) is more like sqrt(x).

Exactly. I think the main "problem" here is that we do mix two types of WAL records, with quite different characteristics:

 (a) full_page_writes - very high volume right after checkpoint, then
     usually drops to much lower volume

 (b) regular records - about the same volume over time (well, lower
     volume right after the checkpoint, as that's where FPWs happen)

We completely ignore this when computing elapsed_xlogs, because we compute it (about) like this:

    elapsed_xlogs = wal_since_checkpoint / CheckPointSegments;

which of course gets confused when we write a lot of WAL right after a checkpoint, because of FPW. But what if we actually tracked the amount of WAL produced by FWP in a checkpoint (which we current don't AFAIK)?

Then we could compute the expected *remaining* amount of WAL to be produced within the checkpoint interval, and use that to compute a better progress like this:

  wal_bytes          - WAL (total)
  wal_fpw_bytes      - WAL (due to FPW)
  prev_wal_bytes     - WAL (total) in previous checkpoint
  prev_wal_fpw_bytes - WAL (due to FPW) in previous checkpoint

So we know that we should expect about

  (prev_wal_bytes - wal_bytes) + (prev_wal_fpw_bytes - wal_fpw_bytes)

  (       regular WAL        ) + (              FPW WAL             )

to be produced until the end of the current checkpoint. I don't have a clear idea how to transform this into the 'progress' yet, but I'm pretty sure tracking the two types of WAL is a key to a better solution. The x^1.5 is probably a step in the right direction, but I don't feel particularly confident about the 1.5 (which is rather arbitrary).

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to