On Mon, Aug 25, 2014 at 07:12:33AM +0200, Guillaume Lelarge wrote: > Le 8 août 2014 09:08, "Guillaume Lelarge" <guilla...@lelarge.info> a écrit : > > > > Hi, > > > > As part of our monitoring work for our customers, we stumbled upon an issue > with our customers' servers who have a wal_keep_segments setting higher than > 0. > > > > We have a monitoring script that checks the number of WAL files in the > pg_xlog directory, according to the setting of three parameters > (checkpoint_completion_target, checkpoint_segments, and wal_keep_segments). We > usually add a percentage to the usual formula: > > > > greatest( > > (2 + checkpoint_completion_target) * checkpoint_segments + 1, > > checkpoint_segments + wal_keep_segments + 1 > > ) > > > > And we have lots of alerts from the script for customers who set their > wal_keep_segments setting higher than 0. > > > > So we started to question this sentence of the documentation: > > > > There will always be at least one WAL segment file, and will normally not be > more than (2 + checkpoint_completion_target) * checkpoint_segments + 1 or > checkpoint_segments + wal_keep_segments + 1 files. > > > > (http://www.postgresql.org/docs/9.3/static/wal-configuration.html) > > > > While doing some tests, it appears it would be more something like: > > > > wal_keep_segments + (2 + checkpoint_completion_target) * checkpoint_segments > + 1 > > > > But after reading the source code (src/backend/access/transam/xlog.c), the > right formula seems to be: > > > > wal_keep_segments + 2 * checkpoint_segments + 1 > > > > Here is how we went to this formula... > > > > CreateCheckPoint(..) is responsible, among other things, for deleting and > recycling old WAL files. From src/backend/access/transam/xlog.c, master > branch, > line 8363: > > > > /* > > * Delete old log files (those no longer needed even for previous > > * checkpoint or the standbys in XLOG streaming). > > */ > > if (_logSegNo) > > { > > KeepLogSeg(recptr, &_logSegNo); > > _logSegNo--; > > RemoveOldXlogFiles(_logSegNo, recptr); > > } > > > > KeepLogSeg(...) function takes care of wal_keep_segments. From src/backend/ > access/transam/xlog.c, master branch, line 8792: > > > > /* compute limit for wal_keep_segments first */ > > if (wal_keep_segments > 0) > > { > > /* avoid underflow, don't go below 1 */ > > if (segno <= wal_keep_segments) > > segno = 1; > > else > > segno = segno - wal_keep_segments; > > } > > > > IOW, the segment number (segno) is decremented according to the setting of > wal_keep_segments. segno is then sent back to CreateCheckPoint(...) via > _logSegNo. The RemoveOldXlogFiles() gets this segment number so that it can > remove or recycle all files before this segment number. This function gets the > number of WAL files to recycle with the XLOGfileslop constant, which is > defined > as: > > > > /* > > * XLOGfileslop is the maximum number of preallocated future XLOG segments. > > * When we are done with an old XLOG segment file, we will recycle it as a > > * future XLOG segment as long as there aren't already XLOGfileslop future > > * segments; else we'll delete it. This could be made a separate GUC > > * variable, but at present I think it's sufficient to hardwire it as > > * 2*CheckPointSegments+1. Under normal conditions, a checkpoint will free > > * no more than 2*CheckPointSegments log segments, and we want to recycle > > all > > * of them; the +1 allows boundary cases to happen without wasting a > > * delete/create-segment cycle. > > */ > > #define XLOGfileslop (2*CheckPointSegments + 1) > > > > (in src/backend/access/transam/xlog.c, master branch, line 100) > > > > IOW, PostgreSQL will keep wal_keep_segments WAL files before the current WAL > file, and then there may be 2*CheckPointSegments + 1 recycled ones. Hence the > formula: > > > > wal_keep_segments + 2 * checkpoint_segments + 1 > > > > And this is what we usually find in our customers' servers. We may find more > WAL files, depending on the write activity of the cluster, but in average, we > get this number of WAL files. > > > > AFAICT, the documentation is wrong about the usual number of WAL files in > > the > pg_xlog directory. But I may be wrong, in which case, the documentation isn't > clear enough for me, and should be fixed so that others can't misinterpret it > like I may have done. > > > > Any comments? did I miss something, or should we fix the documentation?
I looked into this, and came up with more questions. Why is checkpoint_completion_target involved in the total number of WAL segments? If checkpoint_completion_target is 0.5 (the default), the calculation is: (2 + 0.5) * checkpoint_segments + 1 while if it is 0.9, it is: (2 + 0.9) * checkpoint_segments + 1 Is this trying to estimate how many WAL files are going to be created during the checkpoint? If so, wouldn't it be (1 + checkpoint_completion_target), not "2 +". My logic is you have the old WAL files being checkpointed (that's the "1"), plus you have new WAL files being created during the checkpoint, which would be checkpoint_completion_target * checkpoint_segments, plus one for the current WAL file. The original calculation is summarized in this email: http://www.postgresql.org/message-id/AANLkTi=e=oR54OuxAw88=dtv4wt0e5edmigaeztbv...@mail.gmail.com However, in my reading of this, it appears to be double-counting the WAL files during the checkpoint, e.g. the checkpoint_completion_target * checkpoint_segments WAL files are also part of the later checkpoint_segments number. I also don't see how that can be equivalent to: checkpoint_segments + wal_keep_segments + 1 because wal_keep_segments isn't used in the first calculation. Is the user supposed to compute the maximum of those two? Seems easier to just give one expression. Is the right answer: max(checkpoint_segments, wal_keep_segments) + checkpoint_segments + 1 or, if you want to use checkpoint_completion_target, it would be: max(checkpoint_segments * checkpoint_completion_target, wal_keep_segments) + checkpoint_segments + 1 Is checkpoint_completion_target accurate enough to define a maximum number of files? I think I need Masao Fujii's comments on this. The fact the user is seeing something different from what is documented means something probably needs updating. -- Bruce Momjian <br...@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers