Re: index prefetching

2025-08-28 Thread Thomas Munro
On Fri, Aug 29, 2025 at 11:52 AM Tomas Vondra wrote: > True. But one worker did show up in top, using a fair amount of CPU, so > why wouldn't the others (if they process the same stream)? It deliberately concentrates wakeups into the lowest numbered workers that are marked idle in a bitmap. * hi

Re: index prefetching

2025-08-28 Thread Peter Geoghegan
On Thu, Aug 28, 2025 at 9:10 PM Andres Freund wrote: > Same. Tomas, could you share what you applied? Tomas posted a self-contained patch to the list about an hour ago? > > I'm not sure that Thomas'/your patch to ameliorate the problem on the > > read stream side is essential here. Perhaps Andr

Re: index prefetching

2025-08-28 Thread Andres Freund
Hi, On 2025-08-28 19:57:17 -0400, Peter Geoghegan wrote: > On Thu, Aug 28, 2025 at 7:52 PM Tomas Vondra wrote: > > Use this branch: > > > > https://github.com/tvondra/postgres/commits/index-prefetch-master/ > > > > and then Thomas' patch that increases the prefetch distance: > > > > > > https:/

Re: index prefetching

2025-08-28 Thread Tomas Vondra
On 8/29/25 01:57, Peter Geoghegan wrote: > On Thu, Aug 28, 2025 at 7:52 PM Tomas Vondra wrote: >> Use this branch: >> >> https://github.com/tvondra/postgres/commits/index-prefetch-master/ >> >> and then Thomas' patch that increases the prefetch distance: >> >> >> https://www.postgresql.org/mes

Re: index prefetching

2025-08-28 Thread Peter Geoghegan
On Thu, Aug 28, 2025 at 7:52 PM Tomas Vondra wrote: > Use this branch: > > https://github.com/tvondra/postgres/commits/index-prefetch-master/ > > and then Thomas' patch that increases the prefetch distance: > > > https://www.postgresql.org/message-id/CA%2BhUKGL2PhFyDoqrHefqasOnaXhSg48t1phs3VM8BA

Re: index prefetching

2025-08-28 Thread Tomas Vondra
On 8/29/25 01:27, Andres Freund wrote: > Hi, > > On 2025-08-29 01:00:58 +0200, Tomas Vondra wrote: >> I'm not sure how to determine what concurrency it "wants". All I know is >> that for "warm" runs [1], the basic index prefetch patch uses distance >> ~2.0 on average, and is ~2x slower than master

Re: index prefetching

2025-08-28 Thread Tomas Vondra
On 8/28/25 21:52, Andres Freund wrote: > Hi, > > On 2025-08-28 19:08:40 +0200, Tomas Vondra wrote: >> On 8/28/25 18:16, Andres Freund wrote: So I think the IPC overhead with "worker" can be quite significant, especially for cases with distance=1. I don't think it's a major issue

Re: index prefetching

2025-08-28 Thread Andres Freund
Hi, On 2025-08-29 01:00:58 +0200, Tomas Vondra wrote: > I'm not sure how to determine what concurrency it "wants". All I know is > that for "warm" runs [1], the basic index prefetch patch uses distance > ~2.0 on average, and is ~2x slower than master. And with the patches the > distance is ~270, a

Re: index prefetching

2025-08-28 Thread Peter Geoghegan
On Thu, Aug 28, 2025 at 7:01 PM Tomas Vondra wrote: > I'm not sure how to determine what concurrency it "wants". All I know is > that for "warm" runs [1], the basic index prefetch patch uses distance > ~2.0 on average, and is ~2x slower than master. And with the patches the > distance is ~270, and

Re: index prefetching

2025-08-28 Thread Tomas Vondra
On 8/28/25 23:50, Thomas Munro wrote: > On Fri, Aug 29, 2025 at 7:52 AM Andres Freund wrote: >> On 2025-08-28 19:08:40 +0200, Tomas Vondra wrote: >>> From the 2x regression (compared to master) it might seem like that, but >>> even with the increased distance it's still slower than master (by 2

Re: index prefetching

2025-08-28 Thread Thomas Munro
On Fri, Aug 29, 2025 at 7:52 AM Andres Freund wrote: > On 2025-08-28 19:08:40 +0200, Tomas Vondra wrote: > > From the 2x regression (compared to master) it might seem like that, but > > even with the increased distance it's still slower than master (by 25%). So > > maybe the "error" is to use AIO

Re: index prefetching

2025-08-28 Thread Andres Freund
Hi, On 2025-08-28 19:08:40 +0200, Tomas Vondra wrote: > On 8/28/25 18:16, Andres Freund wrote: > >> So I think the IPC overhead with "worker" can be quite significant, > >> especially for cases with distance=1. I don't think it's a major issue > >> for PG18, because seq/bitmap scans are unlikely t

Re: index prefetching

2025-08-28 Thread Tomas Vondra
On 8/28/25 18:16, Andres Freund wrote: > Hi, > > On 2025-08-28 14:45:24 +0200, Tomas Vondra wrote: >> On 8/26/25 17:06, Tomas Vondra wrote: >> I kept thinking about this, and in the end I decided to try to measure >> this IPC overhead. The backend/ioworker communicate by sending signals, >> so I w

Re: index prefetching

2025-08-28 Thread Andres Freund
Hi, On 2025-08-28 14:45:24 +0200, Tomas Vondra wrote: > On 8/26/25 17:06, Tomas Vondra wrote: > I kept thinking about this, and in the end I decided to try to measure > this IPC overhead. The backend/ioworker communicate by sending signals, > so I wrote a simple C program that does "signal echo" w

Re: index prefetching

2025-08-28 Thread Andres Freund
Hi, On 2025-08-26 17:06:11 +0200, Tomas Vondra wrote: > On 8/26/25 01:48, Andres Freund wrote: > > Hi, > > > > On 2025-08-25 15:00:39 +0200, Tomas Vondra wrote: > >> Thanks. Based on the testing so far, the patch seems to be a substantial > >> improvement. What's needed to make this prototype com

Re: index prefetching

2025-08-28 Thread Tomas Vondra
On 8/26/25 17:06, Tomas Vondra wrote: > > > On 8/26/25 01:48, Andres Freund wrote: >> Hi, >> >> On 2025-08-25 15:00:39 +0200, Tomas Vondra wrote: >>> >>> ... >>> >>> I'm not sure what's causing this, but almost all regressions my script >>> is finding look like this - always io_method=worker, wi

Re: index prefetching

2025-08-26 Thread Tomas Vondra
On 8/26/25 01:48, Andres Freund wrote: > Hi, > > On 2025-08-25 15:00:39 +0200, Tomas Vondra wrote: >> Thanks. Based on the testing so far, the patch seems to be a substantial >> improvement. What's needed to make this prototype committable? > > Mainly some testing infrastructure that can trigg

Re: index prefetching

2025-08-26 Thread Tomas Vondra
On 8/26/25 03:08, Peter Geoghegan wrote: > On Mon Aug 25, 2025 at 10:18 AM EDT, Tomas Vondra wrote: >> The attached patch is a PoC implementing this. The core idea is that if >> we measure "miss probability" for a chunk of requests, we can use that >> to estimate the distance needed to generate e_i

Re: index prefetching

2025-08-25 Thread Peter Geoghegan
On Mon Aug 25, 2025 at 10:18 AM EDT, Tomas Vondra wrote: > The attached patch is a PoC implementing this. The core idea is that if > we measure "miss probability" for a chunk of requests, we can use that > to estimate the distance needed to generate e_i_c IOs. I noticed an assertion failure when t

Re: index prefetching

2025-08-25 Thread Andres Freund
Hi, On 2025-08-25 15:00:39 +0200, Tomas Vondra wrote: > Thanks. Based on the testing so far, the patch seems to be a substantial > improvement. What's needed to make this prototype committable? Mainly some testing infrastructure that can trigger this kind of stream. The logic is too finnicky for

Re: index prefetching

2025-08-25 Thread Peter Geoghegan
On Mon, Aug 25, 2025 at 2:33 PM Tomas Vondra wrote: > Right. I might have expressed it more clearly, but this is what I meant > when I said priorbatch is not causing this. Cool. > As for priorbatch, I'd still like to know where does the overhead come > from. I mean, what's the expensive part of

Re: index prefetching

2025-08-25 Thread Tomas Vondra
On 8/25/25 19:57, Peter Geoghegan wrote: > On Mon, Aug 25, 2025 at 10:18 AM Tomas Vondra wrote: >> Almost all regressions (at least the top ones) now look like this, i.e. >> distance collapses to ~2.0, which essentially disables prefetching. > > Good to know. > >> But I no longer think it's c

Re: index prefetching

2025-08-25 Thread Peter Geoghegan
On Mon, Aug 25, 2025 at 10:18 AM Tomas Vondra wrote: > Almost all regressions (at least the top ones) now look like this, i.e. > distance collapses to ~2.0, which essentially disables prefetching. Good to know. > But I no longer think it's caused by the "priorbatch" optimization, > which delays

Re: index prefetching

2025-08-25 Thread Tomas Vondra
On 8/25/25 17:43, Thomas Munro wrote: > On Tue, Aug 26, 2025 at 2:18 AM Tomas Vondra wrote: >> Of course, this can happen even with other hit ratios, there's nothing >> special about 50%. > > Right, that's what this patch was attacking directly, basically only > giving up when misses are so spars

Re: index prefetching

2025-08-25 Thread Tomas Vondra
On 8/25/25 16:18, Tomas Vondra wrote: > ... > > But with more hits, the hit/miss ratio simply determines the "stable" > distance. Let's say there's 80% hits, so 4 hits to 1 miss. Then the > stable distance is ~4, because we get a miss, double to 8, and then 4 > hits, so the distance drops back

Re: index prefetching

2025-08-25 Thread Thomas Munro
On Tue, Aug 26, 2025 at 2:18 AM Tomas Vondra wrote: > Of course, this can happen even with other hit ratios, there's nothing > special about 50%. Right, that's what this patch was attacking directly, basically only giving up when misses are so sparse we can't do anything about it for an ordered s

Re: index prefetching

2025-08-25 Thread Tomas Vondra
On 8/20/25 00:27, Peter Geoghegan wrote: > On Tue, Aug 19, 2025 at 2:22 PM Peter Geoghegan wrote: >> That definitely seems like a problem. I think that you're saying that >> this problem happens because we have extra buffer hits earlier on, >> which is enough to completely change the ramp-up behav

Re: index prefetching

2025-08-25 Thread Tomas Vondra
On 8/15/25 17:09, Andres Freund wrote: > Hi, > > On 2025-08-14 19:36:49 -0400, Andres Freund wrote: >> On 2025-08-14 17:55:53 -0400, Peter Geoghegan wrote: >>> On Thu, Aug 14, 2025 at 5:06 PM Peter Geoghegan wrote: > We can optimize that by deferring the StartBufferIO() if we're > encoun

Re: index prefetching

2025-08-19 Thread Peter Geoghegan
On Tue, Aug 19, 2025 at 2:22 PM Peter Geoghegan wrote: > That definitely seems like a problem. I think that you're saying that > this problem happens because we have extra buffer hits earlier on, > which is enough to completely change the ramp-up behavior. This seems > to be all it takes to dramat

Re: index prefetching

2025-08-19 Thread Peter Geoghegan
On Tue, Aug 19, 2025 at 1:23 PM Tomas Vondra wrote: > Thanks for investigating this. I think it's the right direction - simple > OLTP queries should not be paying for building read_stream when there's > little chance of benefit. > > Unfortunately, this seems to be causing regressions, both compare

Re: index prefetching

2025-08-19 Thread Tomas Vondra
On 8/17/25 19:30, Peter Geoghegan wrote: > On Thu, Aug 14, 2025 at 10:12 PM Peter Geoghegan wrote: >> As far as I know, we only have the following unambiguous performance >> regressions (that clearly need to be fixed): >> >> 1. This issue. >> >> 2. There's about a 3% loss of throughput on pgbench

Re: index prefetching

2025-08-17 Thread Peter Geoghegan
On Thu, Aug 14, 2025 at 10:12 PM Peter Geoghegan wrote: > As far as I know, we only have the following unambiguous performance > regressions (that clearly need to be fixed): > > 1. This issue. > > 2. There's about a 3% loss of throughput on pgbench SELECT. Update: I managed to fix the performance

Re: index prefetching

2025-08-15 Thread Peter Geoghegan
On Fri, Aug 15, 2025 at 3:45 PM Andres Freund wrote: > > My shared_buffers is 16GB, with pgbench scale 300. > > So there's actually no IO, given that a scale 300 is something like 4.7GB? In > that case my patch could really not make a difference, neither of the changed > branches would ever be rea

Re: index prefetching

2025-08-15 Thread Andres Freund
Hi, On 2025-08-15 15:42:10 -0400, Peter Geoghegan wrote: > On Fri, Aug 15, 2025 at 3:38 PM Andres Freund wrote: > > I see absolutely no effect of the patch with shared_buffers=1GB and a > > read-only scale 200 pgbench at 40 clients. What data sizes, shared buffers > > etc. were you testing? > >

Re: index prefetching

2025-08-15 Thread Peter Geoghegan
On Fri, Aug 15, 2025 at 3:38 PM Andres Freund wrote: > I see absolutely no effect of the patch with shared_buffers=1GB and a > read-only scale 200 pgbench at 40 clients. What data sizes, shared buffers > etc. were you testing? Just to be clear: you are testing with both the index prefetching patc

Re: index prefetching

2025-08-15 Thread Andres Freund
Hi, On 2025-08-15 15:31:47 -0400, Peter Geoghegan wrote: > On Fri, Aug 15, 2025 at 3:28 PM Andres Freund wrote: > > >I'm not worried about it. Andres' "not waiting for already-in-progress > > >IO" patch was clearly just a prototype. Just thought it was worth > > >noting here. > > > > Are you conf

Re: index prefetching

2025-08-15 Thread Andres Freund
Hi, On August 15, 2025 3:25:50 PM EDT, Peter Geoghegan wrote: >On Thu, Aug 14, 2025 at 10:12 PM Peter Geoghegan wrote: >> As far as I know, we only have the following unambiguous performance >> regressions (that clearly need to be fixed): >> >> 1. This issue. >> >> 2. There's about a 3% loss of

Re: index prefetching

2025-08-15 Thread Peter Geoghegan
On Fri, Aug 15, 2025 at 3:28 PM Andres Freund wrote: > >I'm not worried about it. Andres' "not waiting for already-in-progress > >IO" patch was clearly just a prototype. Just thought it was worth > >noting here. > > Are you confident in that? Because the patch should be extremely cheap in > that

Re: index prefetching

2025-08-15 Thread Peter Geoghegan
On Thu, Aug 14, 2025 at 10:12 PM Peter Geoghegan wrote: > As far as I know, we only have the following unambiguous performance > regressions (that clearly need to be fixed): > > 1. This issue. > > 2. There's about a 3% loss of throughput on pgbench SELECT. I did a quick pgbench SELECT benchmark a

Re: index prefetching

2025-08-15 Thread Peter Geoghegan
On Fri, Aug 15, 2025 at 1:23 PM Andres Freund wrote: > Somewhat random note about I/O waits: > > Unfortunately the I/O wait time we measure often massively *over* estimate the > actual I/O time. If I execute the above query with the patch applied, we > actually barely ever wait for I/O to complete

Re: index prefetching

2025-08-15 Thread Peter Geoghegan
On Fri, Aug 15, 2025 at 1:09 PM Andres Freund wrote: > On 2025-08-15 12:29:25 -0400, Peter Geoghegan wrote: > > FWIW, this development probably completely changes the results of many > > (all?) of your benchmark queries. My guess is that with Andres' patch, > > things will be better across the boa

Re: index prefetching

2025-08-15 Thread Andres Freund
Hi, On 2025-08-15 12:24:40 -0400, Peter Geoghegan wrote: > With bufmgr patch > - > > ┌─┐ > │ QUERY PLAN │ > ├─┤

Re: index prefetching

2025-08-15 Thread Andres Freund
Hi, Glad to see that the prototype does fix the issue for you. On 2025-08-15 12:29:25 -0400, Peter Geoghegan wrote: > FWIW, this development probably completely changes the results of many > (all?) of your benchmark queries. My guess is that with Andres' patch, > things will be better across the

Re: index prefetching

2025-08-15 Thread Peter Geoghegan
On Fri, Aug 15, 2025 at 12:24 PM Peter Geoghegan wrote: > Good news here: with Andres' bufmgr patch applied, the similar forwards scan > query does indeed get more than 2x faster. And I don't mean that it gets > faster on the randomized table -- it actually gets 2x faster with your > original (al

Re: index prefetching

2025-08-15 Thread Peter Geoghegan
On Thu Aug 14, 2025 at 7:26 PM EDT, Tomas Vondra wrote: >> My guess is that once we fix the underlying problem, we'll see >> improved performance for many different types of queries. Not as big >> of a benefit as the one that the broken query will get, but still >> enough to matter. >> > > Hopeful

Re: index prefetching

2025-08-15 Thread Andres Freund
Hi, On 2025-08-14 19:36:49 -0400, Andres Freund wrote: > On 2025-08-14 17:55:53 -0400, Peter Geoghegan wrote: > > On Thu, Aug 14, 2025 at 5:06 PM Peter Geoghegan wrote: > > > > We can optimize that by deferring the StartBufferIO() if we're > > > > encountering a > > > > buffer that is undergoing

Re: index prefetching

2025-08-14 Thread Peter Geoghegan
On Thu, Aug 14, 2025 at 7:26 PM Tomas Vondra wrote: > Good. I admit I lost track of which the various regressions may affect > existing plans, and which are specific to the prefetch patch. As far as I know, we only have the following unambiguous performance regressions (that clearly need to be fi

Re: index prefetching

2025-08-14 Thread Thomas Munro
On Fri, Aug 15, 2025 at 1:47 PM Thomas Munro wrote: > (rather than introducing a secondary reference > counting scheme in the WAL that I think you might be describing?), and s/WAL/read stream/

Re: index prefetching

2025-08-14 Thread Thomas Munro
On Fri, Aug 15, 2025 at 11:21 AM Tomas Vondra wrote: > I don't recall all the details, but IIRC my impression was it'd be best > to do this "caching" entirely in the read_stream.c (so the next_block > callbacks would probably not need to worry about lastBlock at all), > enabled when creating the s

Re: index prefetching

2025-08-14 Thread Andres Freund
Hi, On 2025-08-14 17:55:53 -0400, Peter Geoghegan wrote: > On Thu, Aug 14, 2025 at 5:06 PM Peter Geoghegan wrote: > > > We can optimize that by deferring the StartBufferIO() if we're > > > encountering a > > > buffer that is undergoing IO, at the cost of some complexity. I'm not > > > sure > >

Re: index prefetching

2025-08-14 Thread Tomas Vondra
On 8/15/25 01:05, Peter Geoghegan wrote: > On Thu, Aug 14, 2025 at 6:24 PM Tomas Vondra wrote: >> FWIW I'm not claiming this explains all odd things we're investigating >> in this thread, it's more a confirmation that the scan direction may >> matter if it translates to direction at the device lev

Re: index prefetching

2025-08-14 Thread Tomas Vondra
On 8/14/25 23:55, Peter Geoghegan wrote: > On Thu, Aug 14, 2025 at 5:06 PM Peter Geoghegan wrote: >> If this same mechanism remembered (say) the last 2 heap blocks it >> requested, that might be enough to totally fix this particular >> problem. This isn't a serious proposal, but it'll be simple en

Re: index prefetching

2025-08-14 Thread Peter Geoghegan
On Thu, Aug 14, 2025 at 6:24 PM Tomas Vondra wrote: > FWIW I'm not claiming this explains all odd things we're investigating > in this thread, it's more a confirmation that the scan direction may > matter if it translates to direction at the device level. I don't think > it can explain the strange

Re: index prefetching

2025-08-14 Thread Tomas Vondra
On 8/14/25 01:19, Andres Freund wrote: > Hi, > > On 2025-08-14 01:11:07 +0200, Tomas Vondra wrote: >> On 8/13/25 23:57, Peter Geoghegan wrote: >>> On Wed, Aug 13, 2025 at 5:19 PM Tomas Vondra wrote: It's also not very surprising this happens with backwards scans more. The I/O is apparen

Re: index prefetching

2025-08-14 Thread Peter Geoghegan
On Thu, Aug 14, 2025 at 5:06 PM Peter Geoghegan wrote: > If this same mechanism remembered (say) the last 2 heap blocks it > requested, that might be enough to totally fix this particular > problem. This isn't a serious proposal, but it'll be simple enough to > implement. Hopefully when I do that

Re: index prefetching

2025-08-14 Thread Peter Geoghegan
On Thu, Aug 14, 2025 at 4:44 PM Andres Freund wrote: > Interesting. In the sequential case I see some waits that are not attributed > in explain, due to the waits happening within WaitIO(), not WaitReadBuffers(). > Which indicates that the read stream is trying to re-read a buffer that > previousl

Re: index prefetching

2025-08-14 Thread Andres Freund
Hi, On 2025-08-14 15:45:26 -0400, Peter Geoghegan wrote: > On Thu, Aug 14, 2025 at 3:15 PM Peter Geoghegan wrote: > > Then why does the exact same pair of runs show "I/O Timings: shared > > read=194.629" for the sequential table backwards scan (with total > > execution time 1132.360 ms), versus "

Re: index prefetching

2025-08-14 Thread Peter Geoghegan
On Thu Aug 14, 2025 at 3:41 PM EDT, Andres Freund wrote: > Hm, that is somewhat curious. > > I wonder if there's some wait time that's not being captured by "I/O > Timings". A first thing to do would be to just run strace --summary-only while > running the query, and see if there are syscall wait t

Re: index prefetching

2025-08-14 Thread Peter Geoghegan
On Thu, Aug 14, 2025 at 3:15 PM Peter Geoghegan wrote: > Then why does the exact same pair of runs show "I/O Timings: shared > read=194.629" for the sequential table backwards scan (with total > execution time 1132.360 ms), versus "I/O Timings: shared read=352.88" > (with total execution time 697.

Re: index prefetching

2025-08-14 Thread Andres Freund
Hi, On 2025-08-14 15:15:02 -0400, Peter Geoghegan wrote: > On Thu, Aug 14, 2025 at 2:53 PM Andres Freund wrote: > > I think this is just an indicator of being IO bound. > > Then why does the exact same pair of runs show "I/O Timings: shared > read=194.629" for the sequential table backwards scan

Re: index prefetching

2025-08-14 Thread Andres Freund
Hi, On 2025-08-14 15:30:16 -0400, Peter Geoghegan wrote: > On Thu Aug 14, 2025 at 3:15 PM EDT, Peter Geoghegan wrote: > > On Thu, Aug 14, 2025 at 2:53 PM Andres Freund wrote: > >> I think this is just an indicator of being IO bound. > > > > Then why does the exact same pair of runs show "I/O Timi

Re: index prefetching

2025-08-14 Thread Peter Geoghegan
On Thu Aug 14, 2025 at 3:15 PM EDT, Peter Geoghegan wrote: > On Thu, Aug 14, 2025 at 2:53 PM Andres Freund wrote: >> I think this is just an indicator of being IO bound. > > Then why does the exact same pair of runs show "I/O Timings: shared > read=194.629" for the sequential table backwards scan

Re: index prefetching

2025-08-14 Thread Peter Geoghegan
On Thu, Aug 14, 2025 at 2:53 PM Andres Freund wrote: > I think this is just an indicator of being IO bound. Then why does the exact same pair of runs show "I/O Timings: shared read=194.629" for the sequential table backwards scan (with total execution time 1132.360 ms), versus "I/O Timings: share

Re: index prefetching

2025-08-14 Thread Andres Freund
Hi, On 2025-08-14 14:44:44 -0400, Peter Geoghegan wrote: > On Thu Aug 14, 2025 at 1:57 PM EDT, Peter Geoghegan wrote: > > The only interesting thing about the flame graph is just how little > > difference there seems to be (at least for this particular perf event > > type). > > I captured method_i

Re: index prefetching

2025-08-14 Thread Peter Geoghegan
On Thu Aug 14, 2025 at 1:57 PM EDT, Peter Geoghegan wrote: > The only interesting thing about the flame graph is just how little > difference there seems to be (at least for this particular perf event > type). I captured method_io_uring.c DEBUG output from running each query in the server log, in

Re: index prefetching

2025-08-14 Thread Peter Geoghegan
On Wed Aug 13, 2025 at 8:59 PM EDT, Tomas Vondra wrote: > On 8/14/25 01:50, Peter Geoghegan wrote: >> I first made the order of the table random, except among groups of index >> tuples >> that have exactly the same value. Those will still point to the same 1 or 2 >> heap >> blocks in virtually al

Re: index prefetching

2025-08-13 Thread Peter Geoghegan
On Wed Aug 13, 2025 at 7:50 PM EDT, Peter Geoghegan wrote: > pg@regression:5432 [2476413]=# EXPLAIN (ANALYZE ,costs off, timing off) > SELECT * FROM t WHERE a BETWEEN 16336 AND 49103 ORDER BY a desc; > ┌─┐ > │

Re: index prefetching

2025-08-13 Thread Peter Geoghegan
On Wed, Aug 13, 2025 at 8:59 PM Tomas Vondra wrote: > I investigated this from a different angle, by tracing the I/O request > generated. using perf-trace. And the patterns are massively different. I tried a similar approach myself, using a variety of tools. That didn't get me very far. > So, Q1

Re: index prefetching

2025-08-13 Thread Tomas Vondra
On 8/14/25 01:50, Peter Geoghegan wrote: > On Wed Aug 13, 2025 at 5:19 PM EDT, Tomas Vondra wrote: >> I did investigate this, and I don't think there's anything broken in >> read_stream. It happens because ReadStream has a concept of "ungetting" >> a block, which can happen after hitting some I/

Re: index prefetching

2025-08-13 Thread Thomas Munro
On Thu, Aug 14, 2025 at 9:19 AM Tomas Vondra wrote: > I did investigate this, and I don't think there's anything broken in > read_stream. It happens because ReadStream has a concept of "ungetting" > a block, which can happen after hitting some I/O limits. > > In that case we "remember" the last bl

Re: index prefetching

2025-08-13 Thread Peter Geoghegan
On Wed, Aug 13, 2025 at 7:51 PM Peter Geoghegan wrote: > Apparently random I/O is twice as fast as sequential I/O in descending order! > In > fact, this test case creates the appearance of random I/O being at least > slightly faster than sequential I/O for pages read in _ascending_ order! > > Obv

Re: index prefetching

2025-08-13 Thread Peter Geoghegan
On Wed Aug 13, 2025 at 5:19 PM EDT, Tomas Vondra wrote: > I did investigate this, and I don't think there's anything broken in > read_stream. It happens because ReadStream has a concept of "ungetting" > a block, which can happen after hitting some I/O limits. > > In that case we "remember" the last

Re: index prefetching

2025-08-13 Thread Tomas Vondra
On 8/13/25 23:36, Peter Geoghegan wrote: > On Wed, Aug 13, 2025 at 1:01 PM Tomas Vondra wrote: >> This seems rather bizarre, considering the two tables are exactly the >> same, except that in t2 the first column is negative, and the rows are >> fixed-length. Even heap_page_items says the tables ar

Re: index prefetching

2025-08-13 Thread Andres Freund
Hi, On 2025-08-14 01:11:07 +0200, Tomas Vondra wrote: > On 8/13/25 23:57, Peter Geoghegan wrote: > > On Wed, Aug 13, 2025 at 5:19 PM Tomas Vondra wrote: > >> It's also not very surprising this happens with backwards scans more. > >> The I/O is apparently much slower (due to missing OS prefetch),

Re: index prefetching

2025-08-13 Thread Tomas Vondra
On 8/13/25 23:57, Peter Geoghegan wrote: > On Wed, Aug 13, 2025 at 5:19 PM Tomas Vondra wrote: >> It's also not very surprising this happens with backwards scans more. >> The I/O is apparently much slower (due to missing OS prefetch), so we're >> much more likely to hit the I/O limits (max_ios

Re: index prefetching

2025-08-13 Thread Andres Freund
Hi, On 2025-08-14 00:23:49 +0200, Tomas Vondra wrote: > On 8/13/25 23:37, Andres Freund wrote: > > On 2025-08-13 23:07:07 +0200, Tomas Vondra wrote: > >> On 8/13/25 16:44, Andres Freund wrote: > >>> On 2025-08-13 14:15:37 +0200, Tomas Vondra wrote: > In fact, I believe this is about io_method

Re: index prefetching

2025-08-13 Thread Tomas Vondra
On 8/13/25 23:37, Andres Freund wrote: > Hi, > > On 2025-08-13 23:07:07 +0200, Tomas Vondra wrote: >> On 8/13/25 16:44, Andres Freund wrote: >>> On 2025-08-13 14:15:37 +0200, Tomas Vondra wrote: In fact, I believe this is about io_method. I initially didn't see the difference you desc

Re: index prefetching

2025-08-13 Thread Peter Geoghegan
On Wed, Aug 13, 2025 at 5:19 PM Tomas Vondra wrote: > It's also not very surprising this happens with backwards scans more. > The I/O is apparently much slower (due to missing OS prefetch), so we're > much more likely to hit the I/O limits (max_ios and various other limits > in read_stream_start_p

Re: index prefetching

2025-08-13 Thread Andres Freund
Hi, On 2025-08-13 23:07:07 +0200, Tomas Vondra wrote: > On 8/13/25 16:44, Andres Freund wrote: > > On 2025-08-13 14:15:37 +0200, Tomas Vondra wrote: > >> In fact, I believe this is about io_method. I initially didn't see the > >> difference you described, and then I realized I set io_method=sync t

Re: index prefetching

2025-08-13 Thread Peter Geoghegan
On Wed, Aug 13, 2025 at 1:01 PM Tomas Vondra wrote: > This seems rather bizarre, considering the two tables are exactly the > same, except that in t2 the first column is negative, and the rows are > fixed-length. Even heap_page_items says the tables are exactly the same. > > So why would the index

Re: index prefetching

2025-08-13 Thread Tomas Vondra
On 8/13/25 18:01, Peter Geoghegan wrote: > On Wed, Aug 13, 2025 at 11:28 AM Andres Freund wrote: >>> With "sync" I always get this (after a restart): >>> >>>Buffers: shared hit=7435 read=52801 >>> >>> while with "worker" I get this: >>> >>>Buffers: shared hit=4879 read=52801 >>>Buff

Re: index prefetching

2025-08-13 Thread Tomas Vondra
On 8/13/25 16:44, Andres Freund wrote: > Hi, > > On 2025-08-13 14:15:37 +0200, Tomas Vondra wrote: >> In fact, I believe this is about io_method. I initially didn't see the >> difference you described, and then I realized I set io_method=sync to >> make it easier to track the block access. And

Re: index prefetching

2025-08-13 Thread Tomas Vondra
On 8/13/25 18:36, Peter Geoghegan wrote: > On Wed, Aug 13, 2025 at 8:15 AM Tomas Vondra wrote: >> 1) created a second table with an "inverse pattern" that's decreasing: >> >> create table t2 (like t) with (fillfactor = 20); >> insert into t2 select -a, b from t; >> create index idx2 on t2 (a

Re: index prefetching

2025-08-13 Thread Peter Geoghegan
On Wed, Aug 13, 2025 at 8:15 AM Tomas Vondra wrote: > 1) created a second table with an "inverse pattern" that's decreasing: > > create table t2 (like t) with (fillfactor = 20); > insert into t2 select -a, b from t; > create index idx2 on t2 (a); > alter index idx2 set (deduplicate_items =

Re: index prefetching

2025-08-13 Thread Peter Geoghegan
On Wed, Aug 13, 2025 at 11:28 AM Andres Freund wrote: > > With "sync" I always get this (after a restart): > > > >Buffers: shared hit=7435 read=52801 > > > > while with "worker" I get this: > > > >Buffers: shared hit=4879 read=52801 > >Buffers: shared hit=5151 read=52801 > >Buffers

Re: index prefetching

2025-08-13 Thread Andres Freund
Hi, On 2025-08-13 14:15:37 +0200, Tomas Vondra wrote: > In fact, I believe this is about io_method. I initially didn't see the > difference you described, and then I realized I set io_method=sync to > make it easier to track the block access. And if I change io_method to > worker, I get different

Re: index prefetching

2025-08-13 Thread Tomas Vondra
On 8/13/25 01:33, Peter Geoghegan wrote: > On Tue, Aug 12, 2025 at 7:10 PM Tomas Vondra wrote: >> Actually, this might be a consequence of how backwards scans work (at >> least in btree). I logged the block in index_scan_stream_read_next, and >> this is what I see in the forward scan (at the begin

Re: index prefetching

2025-08-13 Thread Nazir Bilal Yavuz
Hi, On Tue, 12 Aug 2025 at 22:30, Thomas Munro wrote: > > On Tue, Aug 12, 2025 at 11:22 PM Nazir Bilal Yavuz wrote: > > Unfortunately this doesn't work. We need to handle backwards I/O > > combining in the StartReadBuffersImpl() function too as buffer indexes > > won't have correct blocknums. Al

Re: index prefetching

2025-08-12 Thread Peter Geoghegan
On Tue, Aug 12, 2025 at 7:10 PM Tomas Vondra wrote: > Actually, this might be a consequence of how backwards scans work (at > least in btree). I logged the block in index_scan_stream_read_next, and > this is what I see in the forward scan (at the beginning): Just to be clear: you did disable dedu

Re: index prefetching

2025-08-12 Thread Tomas Vondra
On 8/12/25 23:52, Tomas Vondra wrote: > > On 8/12/25 23:22, Peter Geoghegan wrote: >> ... >> >> It looks like the patch does significantly better with the forwards scan, >> compared to the backwards scan (though both are improved by a lot). But >> that's >> not the main thing about these resu

Re: index prefetching

2025-08-12 Thread Andres Freund
Hi, On 2025-08-12 17:22:20 -0400, Peter Geoghegan wrote: > Doesn't look like Linux will do this, if what my local testing shows is > anything > to go on. Yes, matches my experiments outside of postgres too. > I'm a bit surprised by this (I also thought that OS readahead on linux > was quite so

Re: index prefetching

2025-08-12 Thread Tomas Vondra
On 8/12/25 23:22, Peter Geoghegan wrote: > ... > > It looks like the patch does significantly better with the forwards scan, > compared to the backwards scan (though both are improved by a lot). But > that's > not the main thing about these results that I find interesting. > > The really odd t

Re: index prefetching

2025-08-12 Thread Peter Geoghegan
On Tue Aug 12, 2025 at 1:06 AM EDT, Thomas Munro wrote: > I'd be interested to hear more about reverse scans. Bilal was > speculating about backwards I/O combining in read_stream.c a while > back, but we didn't have anything interesting to use it yet. You'll > probably see a flood of uncombined 8

Re: index prefetching

2025-08-12 Thread Andres Freund
Hi, On 2025-08-12 18:53:13 +0200, Tomas Vondra wrote: > I'm running some tests looking for these weird changes, not just with > the patches, but on master too. And I don't think b4212231 changed the > situation very much. > > FWIW this issue is not caused by the index prefetching patches, I can >

Re: index prefetching

2025-08-12 Thread Peter Geoghegan
On Tue, Aug 12, 2025 at 1:51 PM Tomas Vondra wrote: > One more detail I just noticed - the DESC scan apparently needs more > buffers (~87k vs. 57k). That probably shouldn't cause such massive > regression, though. I can reproduce this. I wondered if the difference might be attributable to the is

Re: index prefetching

2025-08-12 Thread Thomas Munro
On Tue, Aug 12, 2025 at 11:22 PM Nazir Bilal Yavuz wrote: > Unfortunately this doesn't work. We need to handle backwards I/O > combining in the StartReadBuffersImpl() function too as buffer indexes > won't have correct blocknums. Also, I think buffer forwarding of split > backwards I/O should be h

Re: index prefetching

2025-08-12 Thread Tomas Vondra
On 8/12/25 18:53, Tomas Vondra wrote: > ... > > EXPLAIN (ANALYZE, COSTS OFF) > SELECT * FROM t WHERE a BETWEEN 16336 AND 49103 ORDER BY a ASC; > > QUERY PLAN > > Index Scan using idx on t >

Re: index prefetching

2025-08-12 Thread Tomas Vondra
On 8/12/25 13:22, Nazir Bilal Yavuz wrote: > Hi, > > On Tue, 12 Aug 2025 at 08:07, Thomas Munro wrote: >> >> On Tue, Aug 12, 2025 at 11:42 AM Peter Geoghegan wrote: >>> On Mon, Aug 11, 2025 at 5:07 PM Tomas Vondra wrote: I can do some tests with forward vs. backwards scans. Of course, the

Re: index prefetching

2025-08-12 Thread Nazir Bilal Yavuz
Hi, On Tue, 12 Aug 2025 at 08:07, Thomas Munro wrote: > > On Tue, Aug 12, 2025 at 11:42 AM Peter Geoghegan wrote: > > On Mon, Aug 11, 2025 at 5:07 PM Tomas Vondra wrote: > > > I can do some tests with forward vs. backwards scans. Of course, the > > > trouble with finding these weird cases is th

Re: index prefetching

2025-08-11 Thread Thomas Munro
On Tue, Aug 12, 2025 at 11:42 AM Peter Geoghegan wrote: > On Mon, Aug 11, 2025 at 5:07 PM Tomas Vondra wrote: > > I can do some tests with forward vs. backwards scans. Of course, the > > trouble with finding these weird cases is that they may be fairly rare. > > So hitting them is a matter or luc

  1   2   3   4   >