Hi,

On 2020-10-28 14:08:52 +0900, Michael Paquier wrote:
> Thanks for confirming.  I have gone through the whole set today,
> splitted the thing into two commits and applied them.  We had
> buildfarm member florican complain about a mistake in one of the
> GetDatum() calls that I took care of already, and there is nothing
> else on my radar.

The code does IO while holding the buffer mapping lock. That seems
*entirely* unacceptable to me. That basically locks 1/128 of shared
buffers against concurrent mapping changes, while reading data that is
likely not to be on disk?  Seriously?


        /* see if the block is in the buffer pool or not */
        LWLockAcquire(partLock, LW_SHARED);
        buf_id = BufTableLookup(&buf_tag, buf_hash);
        if (buf_id >= 0)
        {
                uint32          buf_state;

                /*
                 * Found it.  Now, retrieve its state to know what to do with 
it, and
                 * release the pin immediately.  We do so to limit overhead as 
much as
                 * possible.  We keep the shared LWLock on the target buffer 
mapping
                 * partition for now, so this buffer cannot be evicted, and we 
acquire
                 * an I/O Lock on the buffer as we may need to read its 
contents from
                 * disk.
                 */
                bufdesc = GetBufferDescriptor(buf_id);

                LWLockAcquire(BufferDescriptorGetIOLock(bufdesc), LW_SHARED);
                buf_state = LockBufHdr(bufdesc);
                UnlockBufHdr(bufdesc, buf_state);

                /* If the page is dirty or invalid, skip it */
                if ((buf_state & BM_DIRTY) != 0 || (buf_state & BM_TAG_VALID) 
== 0)
                {
                        LWLockRelease(BufferDescriptorGetIOLock(bufdesc));
                        LWLockRelease(partLock);
                        return true;
                }

                /* Read the buffer from disk, with the I/O lock still held */
                smgrread(smgr, forknum, blkno, buffer);
                LWLockRelease(BufferDescriptorGetIOLock(bufdesc));
        }
        else
        {
                /*
                 * Simply read the buffer.  There's no risk of modification on 
it as
                 * we are holding the buffer pool partition mapping lock.
                 */
                smgrread(smgr, forknum, blkno, buffer);
        }


The justification in the in-shared-buffers case seems to completely
mis-judge costs too:
                 * Found it.  Now, retrieve its state to know what to do with 
it, and
                 * release the pin immediately.  We do so to limit overhead as 
much as
                 * possible.  We keep the shared LWLock on the target buffer 
mapping
                 * partition for now, so this buffer cannot be evicted, and we 
acquire
                 * an I/O Lock on the buffer as we may need to read its 
contents from
                 * disk.
a pin is cheap. Holding the partition lock is not.


Also, using char[BLCKSZ] as a buffer isn't ok. This should use
PGAlignedBlock:
/*
 * Use this, not "char buf[BLCKSZ]", to declare a field or local variable
 * holding a page buffer, if that page might be accessed as a page and not
 * just a string of bytes.  Otherwise the variable might be under-aligned,
 * causing problems on alignment-picky hardware.  (In some places, we use
 * this to declare buffers even though we only pass them to read() and
 * write(), because copying to/from aligned buffers is usually faster than
 * using unaligned buffers.)  We include both "double" and "int64" in the
 * union to ensure that the compiler knows the value must be MAXALIGN'ed
 * (cf. configure's computation of MAXIMUM_ALIGNOF).
 */
typedef union PGAlignedBlock


I think this needs to be quickly reworked or reverted.


Greetings,

Andres Freund


Reply via email to