On 06/19/2018 11:08 AM, Konstantin Knizhnik wrote:
On 18.06.2018 23:47, Andres Freund wrote:
On 2018-06-18 16:44:09 -0400, Robert Haas wrote:
On Sat, Jun 16, 2018 at 3:41 PM, Andres Freund <and...@anarazel.de>
wrote:
The posix_fadvise approach is not perfect, no doubt about that. But it
works pretty well for bitmap heap scans, and it's about 13249x better
(rough estimate) than the current solution (no prefetching).
Sure, but investing in an architecture we know might not live long also
has it's cost. Especially if it's not that complicated to do better.
My guesses are:
- Using OS prefetching is a very small patch.
- Prefetching into shared buffers is a much bigger patch.
Why? The majority of the work is standing up a bgworker that does
prefetching (i.e. reads WAL, figures out reads not in s_b, does
prefetch). Allowing a configurable number + some synchronization between
them isn't that much more work.
I do not think that prefetching in shared buffers requires much more
efforts and make patch more envasive...
It even somehow simplify it, because there is no to maintain own cache
of prefetched pages...
But it will definitely have much more impact on Postgres performance:
contention for buffer locks, throwing away pages accessed by read-only
queries,...
Also there are two points which makes prefetching into shared buffers
more complex:
1. Need to spawn multiple workers to make prefetch in parallel and
somehow distribute work between them.
2. Synchronize work of recovery process with prefetch to prevent
prefetch to go too far and doing useless job.
The same problem exists for prefetch in OS cache, but here risk of false
prefetch is less critical.
I think the main challenge here is that all buffer reads are currently
synchronous (correct me if I'm wrong), while the posix_fadvise() allows
a to prefetch the buffers asynchronously.
I don't think simply spawning a couple of bgworkers to prefetch buffers
is going to be equal to async prefetch, unless we support some sort of
async I/O. Maybe something has changed recently, but every time I looked
for good portable async I/O API/library I got burned.
Now, maybe a couple of bgworkers prefetching buffers synchronously would
be good enough for WAL refetching - after all, we only need to prefetch
data fast enough for the recovery not to wait. But I doubt it's going to
be good enough for bitmap heap scans, for example.
We need a prefetch that allows filling the I/O queues with hundreds of
requests, and I don't think sync prefetch from a handful of bgworkers
can achieve that.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services