Re: [HACKERS] Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?

Tomas Vondra Mon, 14 Dec 2015 15:09:54 -0800

Hi,

I was planning to do some review/testing on this patch, but then Inoticed it was rejected with feedback in 2015-07 and never resubmittedinto another CF. So I won't waste time in testing this unless someoneshouts that I should do that anyway. Instead I'll just post some ideasabout how we might improve the patch, because I'd forget about themotherwise.


On 07/05/2015 09:48 AM, Heikki Linnakangas wrote:


The ideal correction formula f(x), would be such that f(g(X)) = X, where:

  X is time, 0 = beginning of checkpoint, 1.0 = targeted end of
checkpoint (checkpoint_segments), and

  g(X) is the amount of WAL generated. 0 = beginning of checkpoint, 1.0
= targeted end of checkpoint (derived from max_wal_size).

Unfortunately, we don't know the shape of g(X), as that depends on the
workload. It might be linear, if there is no effect at all from
full_page_writes. Or it could be a step-function, where every write
causes a full page write, until all pages have been touched, and after
that none do (something like an UPDATE without a where-clause might
cause that). In pgbench-like workloads, it's something like sqrt(x). I
picked X^1.5 as a reasonable guess. It's close enough to linear that it
shouldn't hurt too much if g(x) is linear. But it cuts the worst spike
at the very beginning, if g(x) is more like sqrt(x).

Exactly. I think the main "problem" here is that we do mix two types ofWAL records, with quite different characteristics:


 (a) full_page_writes - very high volume right after checkpoint, then
     usually drops to much lower volume

 (b) regular records - about the same volume over time (well, lower
     volume right after the checkpoint, as that's where FPWs happen)

We completely ignore this when computing elapsed_xlogs, because wecompute it (about) like this:


    elapsed_xlogs = wal_since_checkpoint / CheckPointSegments;

which of course gets confused when we write a lot of WAL right after acheckpoint, because of FPW. But what if we actually tracked the amountof WAL produced by FWP in a checkpoint (which we current don't AFAIK)?

Then we could compute the expected *remaining* amount of WAL to beproduced within the checkpoint interval, and use that to compute abetter progress like this:


  wal_bytes          - WAL (total)
  wal_fpw_bytes      - WAL (due to FPW)
  prev_wal_bytes     - WAL (total) in previous checkpoint
  prev_wal_fpw_bytes - WAL (due to FPW) in previous checkpoint

So we know that we should expect about

  (prev_wal_bytes - wal_bytes) + (prev_wal_fpw_bytes - wal_fpw_bytes)

  (       regular WAL        ) + (              FPW WAL             )

to be produced until the end of the current checkpoint. I don't have aclear idea how to transform this into the 'progress' yet, but I'm prettysure tracking the two types of WAL is a key to a better solution. Thex^1.5 is probably a step in the right direction, but I don't feelparticularly confident about the 1.5 (which is rather arbitrary).


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?

Reply via email to