I wrote:
> YAMAMOTO Takashi <y...@mwd.biglobe.ne.jp> wrote:
>  
>> LOG:  could not truncate directory "pg_serial": apparent
>> wraparound
 
> there's some sort of cleanup bug to fix in the predicate
> locking's use of SLRU. It may be benign, but we won't really know
> until we find it.  I'm investigating.
 
I'm pretty sure I found it.  When the number serializable
transactions which need to be tracked gets high enough to push
things to the SLRU summarization, and then drops back down, we
haven't been truncating the head page of the active SLRU region
because if we go back into SLRU summarization that saves us from
zeroing the page again.  The problem is that if we don't go back
into SLRU summarization for a long time, we might wrap around to
where SLRU is upset that our head is chasing our tail.  This seems
like a bigger problem than we were trying to solve by not truncating
the page.
 
The issue is complicated a little bit by the fact that the SLRU API
has you specify the *page* for the truncation point, but silently
ignores the request unless the page is in a segment which is past a
segment in use.  So adding the number of pages per SLRU segment to
the head page position should do the right thing.  But it's all
weird enough that I felt it need a bit of commenting.
 
While I was there I noticed that we're doing the unnecessary
flushing (so people can glean information about the SLRU activity
from watching the disk files) right before truncating.  I switched
the truncation to come before the flushing, since flushing pages to
a file and then deleting that file didn't seem productive.
 
Attached find a patch which modifies one line of code, switches the
order of two lines of code, and adds comments.
 
I will add this to the open items for 9.1.  Thanks again to YAMAMOTO
Takashi for his rigorous testing.
 
-Kevin
*** a/src/backend/storage/lmgr/predicate.c
--- b/src/backend/storage/lmgr/predicate.c
***************
*** 920,945 **** CheckPointPredicate(void)
        else
        {
                /*
!                * The SLRU is no longer needed. Truncate everything but the 
last
!                * page. We don't dare to touch the last page in case the SLRU 
is
!                * taken back to use, and the new tail falls on the same page.
                 */
!               tailPage = oldSerXidControl->headPage;
                oldSerXidControl->headPage = -1;
        }
  
        LWLockRelease(OldSerXidLock);
  
        /*
         * Flush dirty SLRU pages to disk
         *
         * This is not actually necessary from a correctness point of view. We 
do
         * it merely as a debugging aid.
         */
        SimpleLruFlush(OldSerXidSlruCtl, true);
- 
-       /* Truncate away pages that are no longer required */
-       SimpleLruTruncate(OldSerXidSlruCtl, tailPage);
  }
  
  /*------------------------------------------------------------------------*/
--- 920,957 ----
        else
        {
                /*
!                * The SLRU is no longer needed. Truncate everything.  If we 
try to
!                * leave the head page around to avoid re-zeroing it, we might 
not
!                * use the SLRU again until we're past the wrap-around point, 
which
!                * makes SLRU unhappy.
!                *
!                * While the API asks you to specify truncation by page, it 
silently
!                * ignores the request unless the specified page is in a segment
!                * past some allocated portion of the SLRU.  We don't care which
!                * page in a later segment we hit, so just add the number of 
pages
!                * per segment to the head page to land us *somewhere* in the 
next
!                * segment.
                 */
!               tailPage = oldSerXidControl->headPage + SLRU_PAGES_PER_SEGMENT;
                oldSerXidControl->headPage = -1;
        }
  
        LWLockRelease(OldSerXidLock);
  
+       /* Truncate away pages that are no longer required */
+       SimpleLruTruncate(OldSerXidSlruCtl, tailPage);
+ 
        /*
         * Flush dirty SLRU pages to disk
         *
         * This is not actually necessary from a correctness point of view. We 
do
         * it merely as a debugging aid.
+        *
+        * We're doing this after the truncation to avoid writing pages right
+        * before deleting the file in which they sit, which would be completely
+        * pointless.
         */
        SimpleLruFlush(OldSerXidSlruCtl, true);
  }
  
  /*------------------------------------------------------------------------*/
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to