Hello, I recently got a trouble on development of my extension that utilizes the shared buffer when it released each buffer page.
This extension transfers contents of the shared buffers to GPU device using DMA feature, then kicks a device kernel code. Usually 8KB (= BLCKSZ) is too small as a unit size for calculation, so this extension pins multiple pages prior to DMA transfer, then it releases after the device kernel execution. For the performance reason, 16MB-64MB is a preferable data size per a device kernel execution. DMA transfer of 16MB-64MB needs 2048-8192 pages being pinned simultaneously. Once backend/extension calls ReadBuffer(), resowner.c tracks which buffer was referenced by the current resource owner, to ensure these buffers being released at end of the transaction. However, it seems to me implementation of resowner.c didn't assume many buffers are referenced by a particular resource owner simultaneously. It manages the buffer index using an expandable array, then looks up the target buffer by sequential walk but from the tail because recently pinned buffer tends to be released first. It made a trouble in my case. My extension pinned multiple thousands buffers, so owner->buffers[] were enlarged and takes expensive cost to walk on. In my measurement, ResourceOwnerForgetBuffer() takes 36 seconds in total during hash-joining 2M rows; even though hash-joining itself takes less than 16 seconds. What is the best way to solve the problem? Idea-1) Put ResourceOwnerForgetBuffer() O(1) logic, instead of O(N^2). The source of problem come from data structure in ResourceOwnerData, so a straightforward way is to apply O(1) logic based on hashing, instead of the linear search. An issue is how beneficial or harmless to the core code, not only my extension. Probably, it "potentially" beneficial to the core backend also. However, its effect is not easy to observe right now because usual workload takes enough small amount of buffers at the same time. The attached patch applies O(1) logic on ResourceOwnerForgetBuffer(). It makes time consumption 36sec->0.09sec during 20M rows joinning based on hash-logic. Idea-2) track shared buffer being referenced by extension itself One other, but not preferable, option is to call ResourceOwnerForgetBuffer() just after ReadBuffer() on the extension side. Once resource-owner forget it, extension shall be responsible to release the buffer at end of the transaction, even if it aborted. It also makes us unavailable to use ReleaseBuffer(), so extension has to have duplication of ReleaseBuffer() but no ResourceOwnerForgetBuffer(). This idea has few advantage towards the idea-1, but only advantage is to avoid changes to the core PostgreSQL. Any comments? Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kai...@ak.jp.nec.com>
pgsql-v9.5-resowner-forget-buffer-o1.v1.patch
Description: pgsql-v9.5-resowner-forget-buffer-o1.v1.patch
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers