On Fri, Dec 5, 2025 at 6:11 AM Peter Geoghegan wrote:
> On Thu, Dec 4, 2025 at 12:54 AM Amit Langote wrote:
> > I want to acknowledge that figuring out the right layering to make I/O
> > prefetching and perhaps other optimizations internal to IndexNext()
> > work is obviously the priority right n
Hi Amit,
On Thu, Dec 4, 2025 at 12:54 AM Amit Langote wrote:
> I want to acknowledge that figuring out the right layering to make I/O
> prefetching and perhaps other optimizations internal to IndexNext()
> work is obviously the priority right now, regardless of the output
> format used to populat
Hi Peter,
On Mon, Dec 1, 2025 at 10:24 AM Peter Geoghegan wrote:
> On Mon, Nov 10, 2025 at 6:59 PM Peter Geoghegan wrote:
> > The new tentative plan is to cut scope by focussing on switching over
> > to the new index AM + table AM interface from the patch in the short
> > term, for Postgres 19.
On Mon, Dec 1, 2025 at 11:32 AM Tomas Vondra wrote:
> Thanks for the new version! I like the layering in this patch, moving
> some of the stuff from indexam.c/executor to table AM. It makes some of
> the code much cleaner, I think.
Yeah, I think so too. Clearly the main way that this improves the
On 12/1/25 02:23, Peter Geoghegan wrote:
> On Mon, Nov 10, 2025 at 6:59 PM Peter Geoghegan wrote:
>> The new tentative plan is to cut scope by focussing on switching over
>> to the new index AM + table AM interface from the patch in the short
>> term, for Postgres 19.
>
> Attached patch makes the
On Mon, Nov 24, 2025 at 3:48 PM Andres Freund wrote:
> Huh. I wouldn't have expected -march=native to make a huge difference...
Me neither. On the other hand I find that this area is quite sensitive
to icache misses and branch misprediction penalties. This is partly
due to my holding the patch to
Hi,
On 2025-11-23 19:03:44 -0500, Peter Geoghegan wrote:
> On Fri, Nov 21, 2025 at 6:31 PM Andres Freund wrote:
> > On 2025-11-21 18:14:56 -0500, Peter Geoghegan wrote:
> > > On Fri, Nov 21, 2025 at 5:38 PM Andres Freund wrote:
> > > > Another benfit is that it helps even more when there multipl
On Fri, Nov 21, 2025 at 6:31 PM Andres Freund wrote:
> On 2025-11-21 18:14:56 -0500, Peter Geoghegan wrote:
> > On Fri, Nov 21, 2025 at 5:38 PM Andres Freund wrote:
> > > Another benfit is that it helps even more when there multiple queries
> > > running
> > > concurrently - the high rate of loc
Hi,
On 2025-11-21 18:14:56 -0500, Peter Geoghegan wrote:
> On Fri, Nov 21, 2025 at 5:38 PM Andres Freund wrote:
> > Another benfit is that it helps even more when there multiple queries
> > running
> > concurrently - the high rate of lock/unlock on the buffer rather badly hurts
> > scalability.
On Fri, Nov 21, 2025 at 5:38 PM Andres Freund wrote:
> Another benfit is that it helps even more when there multiple queries running
> concurrently - the high rate of lock/unlock on the buffer rather badly hurts
> scalability.
I haven't noticed that effect myself. In fact, it seemed to be the
oth
Hi,
On 2025-11-10 18:59:07 -0500, Peter Geoghegan wrote:
> Tomas and I had a meeting on Friday to discuss a way forward with this
> project. Progress has stalled, and we feel that now is a good time to
> pivot by refactoring the patch into smaller, independently
> useful/committable pieces.
Makes
On Wed, Nov 12, 2025 at 12:39 PM Tomas Vondra wrote:
> I think I generally agree with what you said here about the challenges,
> although it's a bit too abstract to respond to individual parts. I just
> don't know how to rework the design to resolve this ...
I'm trying to identify which subsets o
On 11/11/25 00:59, Peter Geoghegan wrote:
> On Sun, Nov 2, 2025 at 6:49 PM Peter Geoghegan wrote:
>> Nothing really new here (I've been
>> working on batching on the table AM side, but nothing to show on that
>> just yet).
>
> Tomas and I had a meeting on Friday to discuss a way forward with this
On Sun, Nov 2, 2025 at 6:49 PM Peter Geoghegan wrote:
> Nothing really new here (I've been
> working on batching on the table AM side, but nothing to show on that
> just yet).
Tomas and I had a meeting on Friday to discuss a way forward with this
project. Progress has stalled, and we feel that no
On 11/3/25 00:49, Peter Geoghegan wrote:
> On Sun, Oct 12, 2025 at 2:52 PM Peter Geoghegan wrote:
>> Attached in a new revision, mostly just to keep CFBot happy following
>> a recent trivial conflict introduced on master.
>
> Attached is another revision, also just to keep CFBot happy followin
On 9/15/25 17:12, Peter Geoghegan wrote:
> On Mon, Sep 15, 2025 at 9:00 AM Tomas Vondra wrote:
>> Yeah, this heuristics seems very effective in eliminating the regression
>> (at least judging by the test results I've seen so far). Two or three
>> question bother me about it, though:
>
> I more or
On Mon, Sep 15, 2025 at 9:00 AM Tomas Vondra wrote:
> Yeah, this heuristics seems very effective in eliminating the regression
> (at least judging by the test results I've seen so far). Two or three
> question bother me about it, though:
I more or less agree with all of your concerns about the
IN
On Wed, Sep 3, 2025 at 4:06 PM Andres Freund wrote:
> The issue to me is that this kind of query actually *can* substantially
> benefit from prefetching, no?
As far as I can tell, not really, no.
> Afaict the performance without prefetching is
> rather atrocious as soon as a) storage has a tad h
On 9/3/25 22:06, Andres Freund wrote:
> ...
>
> I continue to be worried that we're optimizing for queries that have no
> real-world relevance. The regression afaict is contingent on
>
> 1) An access pattern that is unpredictable to the CPU (due to the use of
>random() as part of ORDER BY duri
On Wed, Sep 3, 2025 at 8:16 PM Andres Freund wrote:
> > I don't see that level of improvement with DIO. For me it's 6054.921
> > ms with prefetching, 8766.287 ms without it.
>
> I guess your SSD has lower latency than mine...
It's nothing special: a 4 year old Samsung 980 pro.
> This actually mi
Hi,
On 2025-09-03 16:25:56 -0400, Peter Geoghegan wrote:
> On Wed, Sep 3, 2025 at 4:06 PM Andres Freund wrote:
> > The issue to me is that this kind of query actually *can* substantially
> > benefit from prefetching, no?
>
> As far as I can tell, not really, no.
It seems to here - I see small wi
On Wed, Sep 3, 2025 at 2:47 PM Andres Freund wrote:
> I still don't think I fully understand why the impact of this is so large. The
> branch misses appear to be the only thing differentiating the two cases, but
> with resowners neutralized, the remaining difference in branch misses seems
> too la
Hi,
On 2025-09-03 15:33:30 -0400, Peter Geoghegan wrote:
> On Wed, Sep 3, 2025 at 2:47 PM Andres Freund wrote:
> > I still don't think I fully understand why the impact of this is so large.
> > The
> > branch misses appear to be the only thing differentiating the two cases, but
> > with resowner
Hi,
I spent a fair bit more time analyzing this issue.
On 2025-08-28 21:10:48 -0400, Andres Freund wrote:
> On 2025-08-28 19:57:17 -0400, Peter Geoghegan wrote:
> > On Thu, Aug 28, 2025 at 7:52 PM Tomas Vondra wrote:
> > I'm not sure that Thomas'/your patch to ameliorate the problem on the
> >
On Fri, Aug 29, 2025 at 11:52 AM Tomas Vondra wrote:
> True. But one worker did show up in top, using a fair amount of CPU, so
> why wouldn't the others (if they process the same stream)?
It deliberately concentrates wakeups into the lowest numbered workers
that are marked idle in a bitmap.
* hi
On Thu, Aug 28, 2025 at 9:10 PM Andres Freund wrote:
> Same. Tomas, could you share what you applied?
Tomas posted a self-contained patch to the list about an hour ago?
> > I'm not sure that Thomas'/your patch to ameliorate the problem on the
> > read stream side is essential here. Perhaps Andr
Hi,
On 2025-08-28 19:57:17 -0400, Peter Geoghegan wrote:
> On Thu, Aug 28, 2025 at 7:52 PM Tomas Vondra wrote:
> > Use this branch:
> >
> > https://github.com/tvondra/postgres/commits/index-prefetch-master/
> >
> > and then Thomas' patch that increases the prefetch distance:
> >
> >
> > https:/
On 8/29/25 01:57, Peter Geoghegan wrote:
> On Thu, Aug 28, 2025 at 7:52 PM Tomas Vondra wrote:
>> Use this branch:
>>
>> https://github.com/tvondra/postgres/commits/index-prefetch-master/
>>
>> and then Thomas' patch that increases the prefetch distance:
>>
>>
>> https://www.postgresql.org/mes
On Thu, Aug 28, 2025 at 7:52 PM Tomas Vondra wrote:
> Use this branch:
>
> https://github.com/tvondra/postgres/commits/index-prefetch-master/
>
> and then Thomas' patch that increases the prefetch distance:
>
>
> https://www.postgresql.org/message-id/CA%2BhUKGL2PhFyDoqrHefqasOnaXhSg48t1phs3VM8BA
On 8/29/25 01:27, Andres Freund wrote:
> Hi,
>
> On 2025-08-29 01:00:58 +0200, Tomas Vondra wrote:
>> I'm not sure how to determine what concurrency it "wants". All I know is
>> that for "warm" runs [1], the basic index prefetch patch uses distance
>> ~2.0 on average, and is ~2x slower than master
On 8/28/25 21:52, Andres Freund wrote:
> Hi,
>
> On 2025-08-28 19:08:40 +0200, Tomas Vondra wrote:
>> On 8/28/25 18:16, Andres Freund wrote:
So I think the IPC overhead with "worker" can be quite significant,
especially for cases with distance=1. I don't think it's a major issue
Hi,
On 2025-08-29 01:00:58 +0200, Tomas Vondra wrote:
> I'm not sure how to determine what concurrency it "wants". All I know is
> that for "warm" runs [1], the basic index prefetch patch uses distance
> ~2.0 on average, and is ~2x slower than master. And with the patches the
> distance is ~270, a
On Thu, Aug 28, 2025 at 7:01 PM Tomas Vondra wrote:
> I'm not sure how to determine what concurrency it "wants". All I know is
> that for "warm" runs [1], the basic index prefetch patch uses distance
> ~2.0 on average, and is ~2x slower than master. And with the patches the
> distance is ~270, and
On 8/28/25 23:50, Thomas Munro wrote:
> On Fri, Aug 29, 2025 at 7:52 AM Andres Freund wrote:
>> On 2025-08-28 19:08:40 +0200, Tomas Vondra wrote:
>>> From the 2x regression (compared to master) it might seem like that, but
>>> even with the increased distance it's still slower than master (by 2
On Fri, Aug 29, 2025 at 7:52 AM Andres Freund wrote:
> On 2025-08-28 19:08:40 +0200, Tomas Vondra wrote:
> > From the 2x regression (compared to master) it might seem like that, but
> > even with the increased distance it's still slower than master (by 25%). So
> > maybe the "error" is to use AIO
Hi,
On 2025-08-28 19:08:40 +0200, Tomas Vondra wrote:
> On 8/28/25 18:16, Andres Freund wrote:
> >> So I think the IPC overhead with "worker" can be quite significant,
> >> especially for cases with distance=1. I don't think it's a major issue
> >> for PG18, because seq/bitmap scans are unlikely t
On 8/28/25 18:16, Andres Freund wrote:
> Hi,
>
> On 2025-08-28 14:45:24 +0200, Tomas Vondra wrote:
>> On 8/26/25 17:06, Tomas Vondra wrote:
>> I kept thinking about this, and in the end I decided to try to measure
>> this IPC overhead. The backend/ioworker communicate by sending signals,
>> so I w
Hi,
On 2025-08-28 14:45:24 +0200, Tomas Vondra wrote:
> On 8/26/25 17:06, Tomas Vondra wrote:
> I kept thinking about this, and in the end I decided to try to measure
> this IPC overhead. The backend/ioworker communicate by sending signals,
> so I wrote a simple C program that does "signal echo" w
Hi,
On 2025-08-26 17:06:11 +0200, Tomas Vondra wrote:
> On 8/26/25 01:48, Andres Freund wrote:
> > Hi,
> >
> > On 2025-08-25 15:00:39 +0200, Tomas Vondra wrote:
> >> Thanks. Based on the testing so far, the patch seems to be a substantial
> >> improvement. What's needed to make this prototype com
On 8/26/25 17:06, Tomas Vondra wrote:
>
>
> On 8/26/25 01:48, Andres Freund wrote:
>> Hi,
>>
>> On 2025-08-25 15:00:39 +0200, Tomas Vondra wrote:
>>>
>>> ...
>>>
>>> I'm not sure what's causing this, but almost all regressions my script
>>> is finding look like this - always io_method=worker, wi
On 8/26/25 01:48, Andres Freund wrote:
> Hi,
>
> On 2025-08-25 15:00:39 +0200, Tomas Vondra wrote:
>> Thanks. Based on the testing so far, the patch seems to be a substantial
>> improvement. What's needed to make this prototype committable?
>
> Mainly some testing infrastructure that can trigg
On 8/26/25 03:08, Peter Geoghegan wrote:
> On Mon Aug 25, 2025 at 10:18 AM EDT, Tomas Vondra wrote:
>> The attached patch is a PoC implementing this. The core idea is that if
>> we measure "miss probability" for a chunk of requests, we can use that
>> to estimate the distance needed to generate e_i
On Mon Aug 25, 2025 at 10:18 AM EDT, Tomas Vondra wrote:
> The attached patch is a PoC implementing this. The core idea is that if
> we measure "miss probability" for a chunk of requests, we can use that
> to estimate the distance needed to generate e_i_c IOs.
I noticed an assertion failure when t
Hi,
On 2025-08-25 15:00:39 +0200, Tomas Vondra wrote:
> Thanks. Based on the testing so far, the patch seems to be a substantial
> improvement. What's needed to make this prototype committable?
Mainly some testing infrastructure that can trigger this kind of stream. The
logic is too finnicky for
On Mon, Aug 25, 2025 at 2:33 PM Tomas Vondra wrote:
> Right. I might have expressed it more clearly, but this is what I meant
> when I said priorbatch is not causing this.
Cool.
> As for priorbatch, I'd still like to know where does the overhead come
> from. I mean, what's the expensive part of
On 8/25/25 19:57, Peter Geoghegan wrote:
> On Mon, Aug 25, 2025 at 10:18 AM Tomas Vondra wrote:
>> Almost all regressions (at least the top ones) now look like this, i.e.
>> distance collapses to ~2.0, which essentially disables prefetching.
>
> Good to know.
>
>> But I no longer think it's c
On Mon, Aug 25, 2025 at 10:18 AM Tomas Vondra wrote:
> Almost all regressions (at least the top ones) now look like this, i.e.
> distance collapses to ~2.0, which essentially disables prefetching.
Good to know.
> But I no longer think it's caused by the "priorbatch" optimization,
> which delays
On 8/25/25 17:43, Thomas Munro wrote:
> On Tue, Aug 26, 2025 at 2:18 AM Tomas Vondra wrote:
>> Of course, this can happen even with other hit ratios, there's nothing
>> special about 50%.
>
> Right, that's what this patch was attacking directly, basically only
> giving up when misses are so spars
On 8/25/25 16:18, Tomas Vondra wrote:
> ...
>
> But with more hits, the hit/miss ratio simply determines the "stable"
> distance. Let's say there's 80% hits, so 4 hits to 1 miss. Then the
> stable distance is ~4, because we get a miss, double to 8, and then 4
> hits, so the distance drops back
On Tue, Aug 26, 2025 at 2:18 AM Tomas Vondra wrote:
> Of course, this can happen even with other hit ratios, there's nothing
> special about 50%.
Right, that's what this patch was attacking directly, basically only
giving up when misses are so sparse we can't do anything about it for
an ordered s
On 8/20/25 00:27, Peter Geoghegan wrote:
> On Tue, Aug 19, 2025 at 2:22 PM Peter Geoghegan wrote:
>> That definitely seems like a problem. I think that you're saying that
>> this problem happens because we have extra buffer hits earlier on,
>> which is enough to completely change the ramp-up behav
On 8/15/25 17:09, Andres Freund wrote:
> Hi,
>
> On 2025-08-14 19:36:49 -0400, Andres Freund wrote:
>> On 2025-08-14 17:55:53 -0400, Peter Geoghegan wrote:
>>> On Thu, Aug 14, 2025 at 5:06 PM Peter Geoghegan wrote:
> We can optimize that by deferring the StartBufferIO() if we're
> encoun
On Tue, Aug 19, 2025 at 2:22 PM Peter Geoghegan wrote:
> That definitely seems like a problem. I think that you're saying that
> this problem happens because we have extra buffer hits earlier on,
> which is enough to completely change the ramp-up behavior. This seems
> to be all it takes to dramat
On Tue, Aug 19, 2025 at 1:23 PM Tomas Vondra wrote:
> Thanks for investigating this. I think it's the right direction - simple
> OLTP queries should not be paying for building read_stream when there's
> little chance of benefit.
>
> Unfortunately, this seems to be causing regressions, both compare
On 8/17/25 19:30, Peter Geoghegan wrote:
> On Thu, Aug 14, 2025 at 10:12 PM Peter Geoghegan wrote:
>> As far as I know, we only have the following unambiguous performance
>> regressions (that clearly need to be fixed):
>>
>> 1. This issue.
>>
>> 2. There's about a 3% loss of throughput on pgbench
On Thu, Aug 14, 2025 at 10:12 PM Peter Geoghegan wrote:
> As far as I know, we only have the following unambiguous performance
> regressions (that clearly need to be fixed):
>
> 1. This issue.
>
> 2. There's about a 3% loss of throughput on pgbench SELECT.
Update: I managed to fix the performance
On Fri, Aug 15, 2025 at 3:45 PM Andres Freund wrote:
> > My shared_buffers is 16GB, with pgbench scale 300.
>
> So there's actually no IO, given that a scale 300 is something like 4.7GB? In
> that case my patch could really not make a difference, neither of the changed
> branches would ever be rea
Hi,
On 2025-08-15 15:42:10 -0400, Peter Geoghegan wrote:
> On Fri, Aug 15, 2025 at 3:38 PM Andres Freund wrote:
> > I see absolutely no effect of the patch with shared_buffers=1GB and a
> > read-only scale 200 pgbench at 40 clients. What data sizes, shared buffers
> > etc. were you testing?
>
>
On Fri, Aug 15, 2025 at 3:38 PM Andres Freund wrote:
> I see absolutely no effect of the patch with shared_buffers=1GB and a
> read-only scale 200 pgbench at 40 clients. What data sizes, shared buffers
> etc. were you testing?
Just to be clear: you are testing with both the index prefetching
patc
Hi,
On 2025-08-15 15:31:47 -0400, Peter Geoghegan wrote:
> On Fri, Aug 15, 2025 at 3:28 PM Andres Freund wrote:
> > >I'm not worried about it. Andres' "not waiting for already-in-progress
> > >IO" patch was clearly just a prototype. Just thought it was worth
> > >noting here.
> >
> > Are you conf
Hi,
On August 15, 2025 3:25:50 PM EDT, Peter Geoghegan wrote:
>On Thu, Aug 14, 2025 at 10:12 PM Peter Geoghegan wrote:
>> As far as I know, we only have the following unambiguous performance
>> regressions (that clearly need to be fixed):
>>
>> 1. This issue.
>>
>> 2. There's about a 3% loss of
On Fri, Aug 15, 2025 at 3:28 PM Andres Freund wrote:
> >I'm not worried about it. Andres' "not waiting for already-in-progress
> >IO" patch was clearly just a prototype. Just thought it was worth
> >noting here.
>
> Are you confident in that? Because the patch should be extremely cheap in
> that
On Thu, Aug 14, 2025 at 10:12 PM Peter Geoghegan wrote:
> As far as I know, we only have the following unambiguous performance
> regressions (that clearly need to be fixed):
>
> 1. This issue.
>
> 2. There's about a 3% loss of throughput on pgbench SELECT.
I did a quick pgbench SELECT benchmark a
On Fri, Aug 15, 2025 at 1:23 PM Andres Freund wrote:
> Somewhat random note about I/O waits:
>
> Unfortunately the I/O wait time we measure often massively *over* estimate the
> actual I/O time. If I execute the above query with the patch applied, we
> actually barely ever wait for I/O to complete
On Fri, Aug 15, 2025 at 1:09 PM Andres Freund wrote:
> On 2025-08-15 12:29:25 -0400, Peter Geoghegan wrote:
> > FWIW, this development probably completely changes the results of many
> > (all?) of your benchmark queries. My guess is that with Andres' patch,
> > things will be better across the boa
Hi,
On 2025-08-15 12:24:40 -0400, Peter Geoghegan wrote:
> With bufmgr patch
> -
>
> ┌─┐
> │ QUERY PLAN │
> ├─┤
Hi,
Glad to see that the prototype does fix the issue for you.
On 2025-08-15 12:29:25 -0400, Peter Geoghegan wrote:
> FWIW, this development probably completely changes the results of many
> (all?) of your benchmark queries. My guess is that with Andres' patch,
> things will be better across the
On Fri, Aug 15, 2025 at 12:24 PM Peter Geoghegan wrote:
> Good news here: with Andres' bufmgr patch applied, the similar forwards scan
> query does indeed get more than 2x faster. And I don't mean that it gets
> faster on the randomized table -- it actually gets 2x faster with your
> original (al
On Thu Aug 14, 2025 at 7:26 PM EDT, Tomas Vondra wrote:
>> My guess is that once we fix the underlying problem, we'll see
>> improved performance for many different types of queries. Not as big
>> of a benefit as the one that the broken query will get, but still
>> enough to matter.
>>
>
> Hopeful
Hi,
On 2025-08-14 19:36:49 -0400, Andres Freund wrote:
> On 2025-08-14 17:55:53 -0400, Peter Geoghegan wrote:
> > On Thu, Aug 14, 2025 at 5:06 PM Peter Geoghegan wrote:
> > > > We can optimize that by deferring the StartBufferIO() if we're
> > > > encountering a
> > > > buffer that is undergoing
On Thu, Aug 14, 2025 at 7:26 PM Tomas Vondra wrote:
> Good. I admit I lost track of which the various regressions may affect
> existing plans, and which are specific to the prefetch patch.
As far as I know, we only have the following unambiguous performance
regressions (that clearly need to be fi
On Fri, Aug 15, 2025 at 1:47 PM Thomas Munro wrote:
> (rather than introducing a secondary reference
> counting scheme in the WAL that I think you might be describing?), and
s/WAL/read stream/
On Fri, Aug 15, 2025 at 11:21 AM Tomas Vondra wrote:
> I don't recall all the details, but IIRC my impression was it'd be best
> to do this "caching" entirely in the read_stream.c (so the next_block
> callbacks would probably not need to worry about lastBlock at all),
> enabled when creating the s
Hi,
On 2025-08-14 17:55:53 -0400, Peter Geoghegan wrote:
> On Thu, Aug 14, 2025 at 5:06 PM Peter Geoghegan wrote:
> > > We can optimize that by deferring the StartBufferIO() if we're
> > > encountering a
> > > buffer that is undergoing IO, at the cost of some complexity. I'm not
> > > sure
> >
On 8/15/25 01:05, Peter Geoghegan wrote:
> On Thu, Aug 14, 2025 at 6:24 PM Tomas Vondra wrote:
>> FWIW I'm not claiming this explains all odd things we're investigating
>> in this thread, it's more a confirmation that the scan direction may
>> matter if it translates to direction at the device lev
On 8/14/25 23:55, Peter Geoghegan wrote:
> On Thu, Aug 14, 2025 at 5:06 PM Peter Geoghegan wrote:
>> If this same mechanism remembered (say) the last 2 heap blocks it
>> requested, that might be enough to totally fix this particular
>> problem. This isn't a serious proposal, but it'll be simple en
On Thu, Aug 14, 2025 at 6:24 PM Tomas Vondra wrote:
> FWIW I'm not claiming this explains all odd things we're investigating
> in this thread, it's more a confirmation that the scan direction may
> matter if it translates to direction at the device level. I don't think
> it can explain the strange
On 8/14/25 01:19, Andres Freund wrote:
> Hi,
>
> On 2025-08-14 01:11:07 +0200, Tomas Vondra wrote:
>> On 8/13/25 23:57, Peter Geoghegan wrote:
>>> On Wed, Aug 13, 2025 at 5:19 PM Tomas Vondra wrote:
It's also not very surprising this happens with backwards scans more.
The I/O is apparen
On Thu, Aug 14, 2025 at 5:06 PM Peter Geoghegan wrote:
> If this same mechanism remembered (say) the last 2 heap blocks it
> requested, that might be enough to totally fix this particular
> problem. This isn't a serious proposal, but it'll be simple enough to
> implement. Hopefully when I do that
On Thu, Aug 14, 2025 at 4:44 PM Andres Freund wrote:
> Interesting. In the sequential case I see some waits that are not attributed
> in explain, due to the waits happening within WaitIO(), not WaitReadBuffers().
> Which indicates that the read stream is trying to re-read a buffer that
> previousl
Hi,
On 2025-08-14 15:45:26 -0400, Peter Geoghegan wrote:
> On Thu, Aug 14, 2025 at 3:15 PM Peter Geoghegan wrote:
> > Then why does the exact same pair of runs show "I/O Timings: shared
> > read=194.629" for the sequential table backwards scan (with total
> > execution time 1132.360 ms), versus "
On Thu Aug 14, 2025 at 3:41 PM EDT, Andres Freund wrote:
> Hm, that is somewhat curious.
>
> I wonder if there's some wait time that's not being captured by "I/O
> Timings". A first thing to do would be to just run strace --summary-only while
> running the query, and see if there are syscall wait t
On Thu, Aug 14, 2025 at 3:15 PM Peter Geoghegan wrote:
> Then why does the exact same pair of runs show "I/O Timings: shared
> read=194.629" for the sequential table backwards scan (with total
> execution time 1132.360 ms), versus "I/O Timings: shared read=352.88"
> (with total execution time 697.
Hi,
On 2025-08-14 15:15:02 -0400, Peter Geoghegan wrote:
> On Thu, Aug 14, 2025 at 2:53 PM Andres Freund wrote:
> > I think this is just an indicator of being IO bound.
>
> Then why does the exact same pair of runs show "I/O Timings: shared
> read=194.629" for the sequential table backwards scan
Hi,
On 2025-08-14 15:30:16 -0400, Peter Geoghegan wrote:
> On Thu Aug 14, 2025 at 3:15 PM EDT, Peter Geoghegan wrote:
> > On Thu, Aug 14, 2025 at 2:53 PM Andres Freund wrote:
> >> I think this is just an indicator of being IO bound.
> >
> > Then why does the exact same pair of runs show "I/O Timi
On Thu Aug 14, 2025 at 3:15 PM EDT, Peter Geoghegan wrote:
> On Thu, Aug 14, 2025 at 2:53 PM Andres Freund wrote:
>> I think this is just an indicator of being IO bound.
>
> Then why does the exact same pair of runs show "I/O Timings: shared
> read=194.629" for the sequential table backwards scan
On Thu, Aug 14, 2025 at 2:53 PM Andres Freund wrote:
> I think this is just an indicator of being IO bound.
Then why does the exact same pair of runs show "I/O Timings: shared
read=194.629" for the sequential table backwards scan (with total
execution time 1132.360 ms), versus "I/O Timings: share
Hi,
On 2025-08-14 14:44:44 -0400, Peter Geoghegan wrote:
> On Thu Aug 14, 2025 at 1:57 PM EDT, Peter Geoghegan wrote:
> > The only interesting thing about the flame graph is just how little
> > difference there seems to be (at least for this particular perf event
> > type).
>
> I captured method_i
On Thu Aug 14, 2025 at 1:57 PM EDT, Peter Geoghegan wrote:
> The only interesting thing about the flame graph is just how little
> difference there seems to be (at least for this particular perf event
> type).
I captured method_io_uring.c DEBUG output from running each query in the
server log, in
On Wed Aug 13, 2025 at 8:59 PM EDT, Tomas Vondra wrote:
> On 8/14/25 01:50, Peter Geoghegan wrote:
>> I first made the order of the table random, except among groups of index
>> tuples
>> that have exactly the same value. Those will still point to the same 1 or 2
>> heap
>> blocks in virtually al
On Wed Aug 13, 2025 at 7:50 PM EDT, Peter Geoghegan wrote:
> pg@regression:5432 [2476413]=# EXPLAIN (ANALYZE ,costs off, timing off)
> SELECT * FROM t WHERE a BETWEEN 16336 AND 49103 ORDER BY a desc;
> ┌─┐
> │
On Wed, Aug 13, 2025 at 8:59 PM Tomas Vondra wrote:
> I investigated this from a different angle, by tracing the I/O request
> generated. using perf-trace. And the patterns are massively different.
I tried a similar approach myself, using a variety of tools. That
didn't get me very far.
> So, Q1
On 8/14/25 01:50, Peter Geoghegan wrote:
> On Wed Aug 13, 2025 at 5:19 PM EDT, Tomas Vondra wrote:
>> I did investigate this, and I don't think there's anything broken in
>> read_stream. It happens because ReadStream has a concept of "ungetting"
>> a block, which can happen after hitting some I/
On Thu, Aug 14, 2025 at 9:19 AM Tomas Vondra wrote:
> I did investigate this, and I don't think there's anything broken in
> read_stream. It happens because ReadStream has a concept of "ungetting"
> a block, which can happen after hitting some I/O limits.
>
> In that case we "remember" the last bl
On Wed, Aug 13, 2025 at 7:51 PM Peter Geoghegan wrote:
> Apparently random I/O is twice as fast as sequential I/O in descending order!
> In
> fact, this test case creates the appearance of random I/O being at least
> slightly faster than sequential I/O for pages read in _ascending_ order!
>
> Obv
On Wed Aug 13, 2025 at 5:19 PM EDT, Tomas Vondra wrote:
> I did investigate this, and I don't think there's anything broken in
> read_stream. It happens because ReadStream has a concept of "ungetting"
> a block, which can happen after hitting some I/O limits.
>
> In that case we "remember" the last
On 8/13/25 23:36, Peter Geoghegan wrote:
> On Wed, Aug 13, 2025 at 1:01 PM Tomas Vondra wrote:
>> This seems rather bizarre, considering the two tables are exactly the
>> same, except that in t2 the first column is negative, and the rows are
>> fixed-length. Even heap_page_items says the tables ar
Hi,
On 2025-08-14 01:11:07 +0200, Tomas Vondra wrote:
> On 8/13/25 23:57, Peter Geoghegan wrote:
> > On Wed, Aug 13, 2025 at 5:19 PM Tomas Vondra wrote:
> >> It's also not very surprising this happens with backwards scans more.
> >> The I/O is apparently much slower (due to missing OS prefetch),
On 8/13/25 23:57, Peter Geoghegan wrote:
> On Wed, Aug 13, 2025 at 5:19 PM Tomas Vondra wrote:
>> It's also not very surprising this happens with backwards scans more.
>> The I/O is apparently much slower (due to missing OS prefetch), so we're
>> much more likely to hit the I/O limits (max_ios
Hi,
On 2025-08-14 00:23:49 +0200, Tomas Vondra wrote:
> On 8/13/25 23:37, Andres Freund wrote:
> > On 2025-08-13 23:07:07 +0200, Tomas Vondra wrote:
> >> On 8/13/25 16:44, Andres Freund wrote:
> >>> On 2025-08-13 14:15:37 +0200, Tomas Vondra wrote:
> In fact, I believe this is about io_method
1 - 100 of 329 matches
Mail list logo