On Monday, December 21, 2020 10:25 PM, Amit Kapila wrote: > I have started doing minor edits to the patch especially planning to write a > theory why is this optimization safe and here is what I can come up with: > "To > remove all the pages of the specified relation forks from the buffer pool, we > need to scan the entire buffer pool but we can optimize it by finding the > buffers from BufMapping table provided we know the exact size of each fork > of the relation. The exact size is required to ensure that we don't leave any > buffer for the relation being dropped as otherwise the background writer or > checkpointer can lead to a PANIC error while flushing buffers corresponding > to files that don't exist. > > To know the exact size, we rely on the size cached for each fork by us during > recovery which limits the optimization to recovery and on standbys but we > can easily extend it once we have shared cache for relation size. > > In recovery, we cache the value returned by the first lseek(SEEK_END) and > the future writes keeps the cached value up-to-date. See smgrextend. It is > possible that the value of the first lseek is smaller than the actual number > of > existing blocks in the file due to buggy Linux kernels that might not have > accounted for the recent write. But that should be fine because there must > not be any buffers after that file size. > > XXX We would make the extra lseek call for the unoptimized paths but that is > okay because we do it just for the first fork and we anyway have to scan the > entire buffer pool the cost of which is so high that the extra lseek call > won't > make any visible difference. However, we can use InRecovery flag to avoid the > additional cost but that doesn't seem worth it." > > Thoughts?
+1 Thank you very much for expanding the comments to carefully explain the reason on why the optimization is safe. I was also struggling to explain it completely but your description also covers the possibility of extending the optimization in the future once we have shared cache for rel size. So I like this addition. (Also, it seems that we have concluded to retain the locking mechanism of the existing patch based from the recent email exchanges. Both the traditional path and the optimized path do the rechecking. So there seems to be no problem, I'm definitely fine with it.) Regards, Kirk Jamison