I said:
> There is a simple error
> in the current code that is easily corrected: in XLogFlush(), the
> wait to acquire WALWriteLock should occur before, not after, we try
> to acquire WALInsertLock and advance our local copy of the write
> request pointer.  (To be exact, xlog.c lines 1255-1269 in CVS tip
> ought to be moved down to before line 1275, inside the "if" that
> tests whether we are going to call XLogWrite.)

That patch was not quite right, as it didn't actually flush the
later-arriving data.   The correct patch is

*** src/backend/access/transam/xlog.c.orig      Thu Sep 26 18:58:33 2002
--- src/backend/access/transam/xlog.c   Sun Oct  6 18:45:57 2002
***************
*** 1252,1279 ****
        /* done already? */
        if (!XLByteLE(record, LogwrtResult.Flush))
        {
-               /* if something was added to log cache then try to flush this too */
-               if (LWLockConditionalAcquire(WALInsertLock, LW_EXCLUSIVE))
-               {
-                       XLogCtlInsert *Insert = &XLogCtl->Insert;
-                       uint32          freespace = INSERT_FREESPACE(Insert);
- 
-                       if (freespace < SizeOfXLogRecord)       /* buffer is full */
-                               WriteRqstPtr = XLogCtl->xlblocks[Insert->curridx];
-                       else
-                       {
-                               WriteRqstPtr = XLogCtl->xlblocks[Insert->curridx];
-                               WriteRqstPtr.xrecoff -= freespace;
-                       }
-                       LWLockRelease(WALInsertLock);
-               }
                /* now wait for the write lock */
                LWLockAcquire(WALWriteLock, LW_EXCLUSIVE);
                LogwrtResult = XLogCtl->Write.LogwrtResult;
                if (!XLByteLE(record, LogwrtResult.Flush))
                {
!                       WriteRqst.Write = WriteRqstPtr;
!                       WriteRqst.Flush = record;
                        XLogWrite(WriteRqst);
                }
                LWLockRelease(WALWriteLock);
--- 1252,1284 ----
        /* done already? */
        if (!XLByteLE(record, LogwrtResult.Flush))
        {
                /* now wait for the write lock */
                LWLockAcquire(WALWriteLock, LW_EXCLUSIVE);
                LogwrtResult = XLogCtl->Write.LogwrtResult;
                if (!XLByteLE(record, LogwrtResult.Flush))
                {
!                       /* try to write/flush later additions to XLOG as well */
!                       if (LWLockConditionalAcquire(WALInsertLock, LW_EXCLUSIVE))
!                       {
!                               XLogCtlInsert *Insert = &XLogCtl->Insert;
!                               uint32          freespace = INSERT_FREESPACE(Insert);
! 
!                               if (freespace < SizeOfXLogRecord)       /* buffer is 
full */
!                                       WriteRqstPtr = 
XLogCtl->xlblocks[Insert->curridx];
!                               else
!                               {
!                                       WriteRqstPtr = 
XLogCtl->xlblocks[Insert->curridx];
!                                       WriteRqstPtr.xrecoff -= freespace;
!                               }
!                               LWLockRelease(WALInsertLock);
!                               WriteRqst.Write = WriteRqstPtr;
!                               WriteRqst.Flush = WriteRqstPtr;
!                       }
!                       else
!                       {
!                               WriteRqst.Write = WriteRqstPtr;
!                               WriteRqst.Flush = record;
!                       }
                        XLogWrite(WriteRqst);
                }
                LWLockRelease(WALWriteLock);


To test this, I made a modified version of pgbench in which each
transaction consists of a simple
        insert into table_NNN values(0);
where each client thread has a separate insertion target table.
This is about the simplest transaction I could think of that would
generate a WAL record each time.

Running this modified pgbench with postmaster parameters
        postmaster -i -N 120 -B 1000 --wal_buffers=250
and all other configuration settings at default, CVS tip code gives me
a pretty consistent 115-118 transactions per second for anywhere from
1 to 100 pgbench client threads.  This is exactly what I expected,
since the database (including WAL file) is on a 7200 RPM SCSI drive.
The theoretical maximum rate of sync'd writes to the WAL file is
therefore 120 per second (one per disk revolution), but we lose a little
because once in awhile the disk has to seek to a data file.

Inserting the above patch, and keeping all else the same, I get:

$ mybench -c 1 -t 10000 bench1
number of clients: 1
number of transactions per client: 10000
number of transactions actually processed: 10000/10000
tps = 116.694205 (including connections establishing)
tps = 116.722648 (excluding connections establishing)

$ mybench -c 5 -t 2000 -S -n bench1
number of clients: 5
number of transactions per client: 2000
number of transactions actually processed: 10000/10000
tps = 282.808341 (including connections establishing)
tps = 283.656898 (excluding connections establishing)

$ mybench -c 10 -t 1000 bench1
number of clients: 10
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
tps = 443.131083 (including connections establishing)
tps = 447.406534 (excluding connections establishing)

$ mybench -c 50 -t 200 bench1
number of clients: 50
number of transactions per client: 200
number of transactions actually processed: 10000/10000
tps = 416.154173 (including connections establishing)
tps = 436.748642 (excluding connections establishing)

$ mybench -c 100 -t 100 bench1
number of clients: 100
number of transactions per client: 100
number of transactions actually processed: 10000/10000
tps = 336.449110 (including connections establishing)
tps = 405.174237 (excluding connections establishing)

CPU loading goes from 80% idle at 1 client to 50% idle at 5 clients
to <10% idle at 10 or more.

So this does seem to be a nice win, and unless I hear objections
I will apply it ...

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Reply via email to