On Thu, Jul 12, 2018 at 8:32 AM Thomas Munro <thomas.mu...@enterprisedb.com> wrote:
> On Thu, Jul 12, 2018 at 12:46 AM, Haribabu Kommi > <kommi.harib...@gmail.com> wrote: > >> > On 2018-04-30 14:59:31 +1200, Thomas Munro wrote: > >> >> In EXPLAIN (BUFFERS), there are two kinds of cache misses that show > up > >> >> as "reads" when in fact they are not reads at all: > >> >> > >> >> 1. Relation extension, which in fact writes a zero-filled block. > >> >> 2. The RBM_ZERO_* modes, which provoke neither read nor write. > > > > I checked the patch and I agree with the change 1). And regarding change > 2) > > whether it is zeroing the contents of the page or not, it does read? > > because if it exists in the buffer pool, we are counting them as hits > > irrespective > > of the mode? Am I missing something? > > Further down in the function you can see that there is no read() > system call for the RBM_ZERO_* modes: > > if (mode == RBM_ZERO_AND_LOCK || mode == > RBM_ZERO_AND_CLEANUP_LOCK) > MemSet((char *) bufBlock, 0, BLCKSZ); > else > { > ... > smgrread(smgr, forkNum, blockNum, (char *) > bufBlock); > ... > } > Thanks for the details. I got your point. But we need to include RBM_ZERO_ON_ERROR case read operations, excluding others are fine. > I suppose someone might argue that even when it's not a hit and it's > not a read, we might still want to count this buffer interaction in > some other way. Perhaps there should be a separate counter? It may > technically be a kind of cache miss, but it's nowhere near as > expensive as a synchronous system call like read() so I didn't propose > that. > Yes, I agree that we may need a new counter that counts the buffers that are just allocated (no read or no write). But currently, may be the counter value is very less, so people are not interested. > Some more on my motivation: In our zheap prototype, when the system > is working well and we have enough space, we constantly allocate > zeroed buffer pages at the insert point (= head) of an undo log and > drop pages at the discard point (= tail) in the background; > effectively a few pages just go round and round via the freelist and > no read() or write() syscalls happen. That's something I'm very happy > about and it's one of our claimed advantages over the traditional heap > (which tends to read and dirty more pages), but EXPLAIN (BUFFERS) > hides this virtuous behaviour when comparing with the traditional > heap: it falsely and slanderously reports that zheap is reading undo > pages when it is not. Of course I don't intent to litigate zheap > design in this thread, I just I figured that since this accounting is > wrong on principle and affects current PostgreSQL too (at least in > theory) I would propose this little patch independently. It's subtle > enough that I wouldn't bother to back-patch it though. > OK. May be it is better to implement the buffer allocate counter along with zheap to provide better buffer results? Regards, Haribabu Kommi Fujitsu Australia