On Fri, Aug 29, 2025 at 11:52 AM Tomas Vondra wrote:
> True. But one worker did show up in top, using a fair amount of CPU, so
> why wouldn't the others (if they process the same stream)?
It deliberately concentrates wakeups into the lowest numbered workers
that are marked idle in a bitmap.
* hi
On Thu, Aug 28, 2025 at 9:10 PM Andres Freund wrote:
> Same. Tomas, could you share what you applied?
Tomas posted a self-contained patch to the list about an hour ago?
> > I'm not sure that Thomas'/your patch to ameliorate the problem on the
> > read stream side is essential here. Perhaps Andr
Hi,
On 2025-08-28 19:57:17 -0400, Peter Geoghegan wrote:
> On Thu, Aug 28, 2025 at 7:52 PM Tomas Vondra wrote:
> > Use this branch:
> >
> > https://github.com/tvondra/postgres/commits/index-prefetch-master/
> >
> > and then Thomas' patch that increases the prefetch distance:
> >
> >
> > https:/
On 8/29/25 01:57, Peter Geoghegan wrote:
> On Thu, Aug 28, 2025 at 7:52 PM Tomas Vondra wrote:
>> Use this branch:
>>
>> https://github.com/tvondra/postgres/commits/index-prefetch-master/
>>
>> and then Thomas' patch that increases the prefetch distance:
>>
>>
>> https://www.postgresql.org/mes
On Thu, Aug 28, 2025 at 7:52 PM Tomas Vondra wrote:
> Use this branch:
>
> https://github.com/tvondra/postgres/commits/index-prefetch-master/
>
> and then Thomas' patch that increases the prefetch distance:
>
>
> https://www.postgresql.org/message-id/CA%2BhUKGL2PhFyDoqrHefqasOnaXhSg48t1phs3VM8BA
On 8/29/25 01:27, Andres Freund wrote:
> Hi,
>
> On 2025-08-29 01:00:58 +0200, Tomas Vondra wrote:
>> I'm not sure how to determine what concurrency it "wants". All I know is
>> that for "warm" runs [1], the basic index prefetch patch uses distance
>> ~2.0 on average, and is ~2x slower than master
On 8/28/25 21:52, Andres Freund wrote:
> Hi,
>
> On 2025-08-28 19:08:40 +0200, Tomas Vondra wrote:
>> On 8/28/25 18:16, Andres Freund wrote:
So I think the IPC overhead with "worker" can be quite significant,
especially for cases with distance=1. I don't think it's a major issue
Hi,
On 2025-08-29 01:00:58 +0200, Tomas Vondra wrote:
> I'm not sure how to determine what concurrency it "wants". All I know is
> that for "warm" runs [1], the basic index prefetch patch uses distance
> ~2.0 on average, and is ~2x slower than master. And with the patches the
> distance is ~270, a
On Thu, Aug 28, 2025 at 7:01 PM Tomas Vondra wrote:
> I'm not sure how to determine what concurrency it "wants". All I know is
> that for "warm" runs [1], the basic index prefetch patch uses distance
> ~2.0 on average, and is ~2x slower than master. And with the patches the
> distance is ~270, and
On 8/28/25 23:50, Thomas Munro wrote:
> On Fri, Aug 29, 2025 at 7:52 AM Andres Freund wrote:
>> On 2025-08-28 19:08:40 +0200, Tomas Vondra wrote:
>>> From the 2x regression (compared to master) it might seem like that, but
>>> even with the increased distance it's still slower than master (by 2
On Fri, Aug 29, 2025 at 7:52 AM Andres Freund wrote:
> On 2025-08-28 19:08:40 +0200, Tomas Vondra wrote:
> > From the 2x regression (compared to master) it might seem like that, but
> > even with the increased distance it's still slower than master (by 25%). So
> > maybe the "error" is to use AIO
Hi,
On 2025-08-28 19:08:40 +0200, Tomas Vondra wrote:
> On 8/28/25 18:16, Andres Freund wrote:
> >> So I think the IPC overhead with "worker" can be quite significant,
> >> especially for cases with distance=1. I don't think it's a major issue
> >> for PG18, because seq/bitmap scans are unlikely t
On 8/28/25 18:16, Andres Freund wrote:
> Hi,
>
> On 2025-08-28 14:45:24 +0200, Tomas Vondra wrote:
>> On 8/26/25 17:06, Tomas Vondra wrote:
>> I kept thinking about this, and in the end I decided to try to measure
>> this IPC overhead. The backend/ioworker communicate by sending signals,
>> so I w
Hi,
On 2025-08-28 14:45:24 +0200, Tomas Vondra wrote:
> On 8/26/25 17:06, Tomas Vondra wrote:
> I kept thinking about this, and in the end I decided to try to measure
> this IPC overhead. The backend/ioworker communicate by sending signals,
> so I wrote a simple C program that does "signal echo" w
Hi,
On 2025-08-26 17:06:11 +0200, Tomas Vondra wrote:
> On 8/26/25 01:48, Andres Freund wrote:
> > Hi,
> >
> > On 2025-08-25 15:00:39 +0200, Tomas Vondra wrote:
> >> Thanks. Based on the testing so far, the patch seems to be a substantial
> >> improvement. What's needed to make this prototype com
On 8/26/25 17:06, Tomas Vondra wrote:
>
>
> On 8/26/25 01:48, Andres Freund wrote:
>> Hi,
>>
>> On 2025-08-25 15:00:39 +0200, Tomas Vondra wrote:
>>>
>>> ...
>>>
>>> I'm not sure what's causing this, but almost all regressions my script
>>> is finding look like this - always io_method=worker, wi
On 8/26/25 01:48, Andres Freund wrote:
> Hi,
>
> On 2025-08-25 15:00:39 +0200, Tomas Vondra wrote:
>> Thanks. Based on the testing so far, the patch seems to be a substantial
>> improvement. What's needed to make this prototype committable?
>
> Mainly some testing infrastructure that can trigg
On 8/26/25 03:08, Peter Geoghegan wrote:
> On Mon Aug 25, 2025 at 10:18 AM EDT, Tomas Vondra wrote:
>> The attached patch is a PoC implementing this. The core idea is that if
>> we measure "miss probability" for a chunk of requests, we can use that
>> to estimate the distance needed to generate e_i
On Mon Aug 25, 2025 at 10:18 AM EDT, Tomas Vondra wrote:
> The attached patch is a PoC implementing this. The core idea is that if
> we measure "miss probability" for a chunk of requests, we can use that
> to estimate the distance needed to generate e_i_c IOs.
I noticed an assertion failure when t
Hi,
On 2025-08-25 15:00:39 +0200, Tomas Vondra wrote:
> Thanks. Based on the testing so far, the patch seems to be a substantial
> improvement. What's needed to make this prototype committable?
Mainly some testing infrastructure that can trigger this kind of stream. The
logic is too finnicky for
On Mon, Aug 25, 2025 at 2:33 PM Tomas Vondra wrote:
> Right. I might have expressed it more clearly, but this is what I meant
> when I said priorbatch is not causing this.
Cool.
> As for priorbatch, I'd still like to know where does the overhead come
> from. I mean, what's the expensive part of
On 8/25/25 19:57, Peter Geoghegan wrote:
> On Mon, Aug 25, 2025 at 10:18 AM Tomas Vondra wrote:
>> Almost all regressions (at least the top ones) now look like this, i.e.
>> distance collapses to ~2.0, which essentially disables prefetching.
>
> Good to know.
>
>> But I no longer think it's c
On Mon, Aug 25, 2025 at 10:18 AM Tomas Vondra wrote:
> Almost all regressions (at least the top ones) now look like this, i.e.
> distance collapses to ~2.0, which essentially disables prefetching.
Good to know.
> But I no longer think it's caused by the "priorbatch" optimization,
> which delays
On 8/25/25 17:43, Thomas Munro wrote:
> On Tue, Aug 26, 2025 at 2:18 AM Tomas Vondra wrote:
>> Of course, this can happen even with other hit ratios, there's nothing
>> special about 50%.
>
> Right, that's what this patch was attacking directly, basically only
> giving up when misses are so spars
On 8/25/25 16:18, Tomas Vondra wrote:
> ...
>
> But with more hits, the hit/miss ratio simply determines the "stable"
> distance. Let's say there's 80% hits, so 4 hits to 1 miss. Then the
> stable distance is ~4, because we get a miss, double to 8, and then 4
> hits, so the distance drops back
On Tue, Aug 26, 2025 at 2:18 AM Tomas Vondra wrote:
> Of course, this can happen even with other hit ratios, there's nothing
> special about 50%.
Right, that's what this patch was attacking directly, basically only
giving up when misses are so sparse we can't do anything about it for
an ordered s
On 8/20/25 00:27, Peter Geoghegan wrote:
> On Tue, Aug 19, 2025 at 2:22 PM Peter Geoghegan wrote:
>> That definitely seems like a problem. I think that you're saying that
>> this problem happens because we have extra buffer hits earlier on,
>> which is enough to completely change the ramp-up behav
On 8/15/25 17:09, Andres Freund wrote:
> Hi,
>
> On 2025-08-14 19:36:49 -0400, Andres Freund wrote:
>> On 2025-08-14 17:55:53 -0400, Peter Geoghegan wrote:
>>> On Thu, Aug 14, 2025 at 5:06 PM Peter Geoghegan wrote:
> We can optimize that by deferring the StartBufferIO() if we're
> encoun
On Tue, Aug 19, 2025 at 2:22 PM Peter Geoghegan wrote:
> That definitely seems like a problem. I think that you're saying that
> this problem happens because we have extra buffer hits earlier on,
> which is enough to completely change the ramp-up behavior. This seems
> to be all it takes to dramat
On Tue, Aug 19, 2025 at 1:23 PM Tomas Vondra wrote:
> Thanks for investigating this. I think it's the right direction - simple
> OLTP queries should not be paying for building read_stream when there's
> little chance of benefit.
>
> Unfortunately, this seems to be causing regressions, both compare
On 8/17/25 19:30, Peter Geoghegan wrote:
> On Thu, Aug 14, 2025 at 10:12 PM Peter Geoghegan wrote:
>> As far as I know, we only have the following unambiguous performance
>> regressions (that clearly need to be fixed):
>>
>> 1. This issue.
>>
>> 2. There's about a 3% loss of throughput on pgbench
On Thu, Aug 14, 2025 at 10:12 PM Peter Geoghegan wrote:
> As far as I know, we only have the following unambiguous performance
> regressions (that clearly need to be fixed):
>
> 1. This issue.
>
> 2. There's about a 3% loss of throughput on pgbench SELECT.
Update: I managed to fix the performance
On Fri, Aug 15, 2025 at 3:45 PM Andres Freund wrote:
> > My shared_buffers is 16GB, with pgbench scale 300.
>
> So there's actually no IO, given that a scale 300 is something like 4.7GB? In
> that case my patch could really not make a difference, neither of the changed
> branches would ever be rea
Hi,
On 2025-08-15 15:42:10 -0400, Peter Geoghegan wrote:
> On Fri, Aug 15, 2025 at 3:38 PM Andres Freund wrote:
> > I see absolutely no effect of the patch with shared_buffers=1GB and a
> > read-only scale 200 pgbench at 40 clients. What data sizes, shared buffers
> > etc. were you testing?
>
>
On Fri, Aug 15, 2025 at 3:38 PM Andres Freund wrote:
> I see absolutely no effect of the patch with shared_buffers=1GB and a
> read-only scale 200 pgbench at 40 clients. What data sizes, shared buffers
> etc. were you testing?
Just to be clear: you are testing with both the index prefetching
patc
Hi,
On 2025-08-15 15:31:47 -0400, Peter Geoghegan wrote:
> On Fri, Aug 15, 2025 at 3:28 PM Andres Freund wrote:
> > >I'm not worried about it. Andres' "not waiting for already-in-progress
> > >IO" patch was clearly just a prototype. Just thought it was worth
> > >noting here.
> >
> > Are you conf
Hi,
On August 15, 2025 3:25:50 PM EDT, Peter Geoghegan wrote:
>On Thu, Aug 14, 2025 at 10:12 PM Peter Geoghegan wrote:
>> As far as I know, we only have the following unambiguous performance
>> regressions (that clearly need to be fixed):
>>
>> 1. This issue.
>>
>> 2. There's about a 3% loss of
On Fri, Aug 15, 2025 at 3:28 PM Andres Freund wrote:
> >I'm not worried about it. Andres' "not waiting for already-in-progress
> >IO" patch was clearly just a prototype. Just thought it was worth
> >noting here.
>
> Are you confident in that? Because the patch should be extremely cheap in
> that
On Thu, Aug 14, 2025 at 10:12 PM Peter Geoghegan wrote:
> As far as I know, we only have the following unambiguous performance
> regressions (that clearly need to be fixed):
>
> 1. This issue.
>
> 2. There's about a 3% loss of throughput on pgbench SELECT.
I did a quick pgbench SELECT benchmark a
On Fri, Aug 15, 2025 at 1:23 PM Andres Freund wrote:
> Somewhat random note about I/O waits:
>
> Unfortunately the I/O wait time we measure often massively *over* estimate the
> actual I/O time. If I execute the above query with the patch applied, we
> actually barely ever wait for I/O to complete
On Fri, Aug 15, 2025 at 1:09 PM Andres Freund wrote:
> On 2025-08-15 12:29:25 -0400, Peter Geoghegan wrote:
> > FWIW, this development probably completely changes the results of many
> > (all?) of your benchmark queries. My guess is that with Andres' patch,
> > things will be better across the boa
Hi,
On 2025-08-15 12:24:40 -0400, Peter Geoghegan wrote:
> With bufmgr patch
> -
>
> ┌─┐
> │ QUERY PLAN │
> ├─┤
Hi,
Glad to see that the prototype does fix the issue for you.
On 2025-08-15 12:29:25 -0400, Peter Geoghegan wrote:
> FWIW, this development probably completely changes the results of many
> (all?) of your benchmark queries. My guess is that with Andres' patch,
> things will be better across the
On Fri, Aug 15, 2025 at 12:24 PM Peter Geoghegan wrote:
> Good news here: with Andres' bufmgr patch applied, the similar forwards scan
> query does indeed get more than 2x faster. And I don't mean that it gets
> faster on the randomized table -- it actually gets 2x faster with your
> original (al
On Thu Aug 14, 2025 at 7:26 PM EDT, Tomas Vondra wrote:
>> My guess is that once we fix the underlying problem, we'll see
>> improved performance for many different types of queries. Not as big
>> of a benefit as the one that the broken query will get, but still
>> enough to matter.
>>
>
> Hopeful
Hi,
On 2025-08-14 19:36:49 -0400, Andres Freund wrote:
> On 2025-08-14 17:55:53 -0400, Peter Geoghegan wrote:
> > On Thu, Aug 14, 2025 at 5:06 PM Peter Geoghegan wrote:
> > > > We can optimize that by deferring the StartBufferIO() if we're
> > > > encountering a
> > > > buffer that is undergoing
On Thu, Aug 14, 2025 at 7:26 PM Tomas Vondra wrote:
> Good. I admit I lost track of which the various regressions may affect
> existing plans, and which are specific to the prefetch patch.
As far as I know, we only have the following unambiguous performance
regressions (that clearly need to be fi
On Fri, Aug 15, 2025 at 1:47 PM Thomas Munro wrote:
> (rather than introducing a secondary reference
> counting scheme in the WAL that I think you might be describing?), and
s/WAL/read stream/
On Fri, Aug 15, 2025 at 11:21 AM Tomas Vondra wrote:
> I don't recall all the details, but IIRC my impression was it'd be best
> to do this "caching" entirely in the read_stream.c (so the next_block
> callbacks would probably not need to worry about lastBlock at all),
> enabled when creating the s
Hi,
On 2025-08-14 17:55:53 -0400, Peter Geoghegan wrote:
> On Thu, Aug 14, 2025 at 5:06 PM Peter Geoghegan wrote:
> > > We can optimize that by deferring the StartBufferIO() if we're
> > > encountering a
> > > buffer that is undergoing IO, at the cost of some complexity. I'm not
> > > sure
> >
On 8/15/25 01:05, Peter Geoghegan wrote:
> On Thu, Aug 14, 2025 at 6:24 PM Tomas Vondra wrote:
>> FWIW I'm not claiming this explains all odd things we're investigating
>> in this thread, it's more a confirmation that the scan direction may
>> matter if it translates to direction at the device lev
On 8/14/25 23:55, Peter Geoghegan wrote:
> On Thu, Aug 14, 2025 at 5:06 PM Peter Geoghegan wrote:
>> If this same mechanism remembered (say) the last 2 heap blocks it
>> requested, that might be enough to totally fix this particular
>> problem. This isn't a serious proposal, but it'll be simple en
On Thu, Aug 14, 2025 at 6:24 PM Tomas Vondra wrote:
> FWIW I'm not claiming this explains all odd things we're investigating
> in this thread, it's more a confirmation that the scan direction may
> matter if it translates to direction at the device level. I don't think
> it can explain the strange
On 8/14/25 01:19, Andres Freund wrote:
> Hi,
>
> On 2025-08-14 01:11:07 +0200, Tomas Vondra wrote:
>> On 8/13/25 23:57, Peter Geoghegan wrote:
>>> On Wed, Aug 13, 2025 at 5:19 PM Tomas Vondra wrote:
It's also not very surprising this happens with backwards scans more.
The I/O is apparen
On Thu, Aug 14, 2025 at 5:06 PM Peter Geoghegan wrote:
> If this same mechanism remembered (say) the last 2 heap blocks it
> requested, that might be enough to totally fix this particular
> problem. This isn't a serious proposal, but it'll be simple enough to
> implement. Hopefully when I do that
On Thu, Aug 14, 2025 at 4:44 PM Andres Freund wrote:
> Interesting. In the sequential case I see some waits that are not attributed
> in explain, due to the waits happening within WaitIO(), not WaitReadBuffers().
> Which indicates that the read stream is trying to re-read a buffer that
> previousl
Hi,
On 2025-08-14 15:45:26 -0400, Peter Geoghegan wrote:
> On Thu, Aug 14, 2025 at 3:15 PM Peter Geoghegan wrote:
> > Then why does the exact same pair of runs show "I/O Timings: shared
> > read=194.629" for the sequential table backwards scan (with total
> > execution time 1132.360 ms), versus "
On Thu Aug 14, 2025 at 3:41 PM EDT, Andres Freund wrote:
> Hm, that is somewhat curious.
>
> I wonder if there's some wait time that's not being captured by "I/O
> Timings". A first thing to do would be to just run strace --summary-only while
> running the query, and see if there are syscall wait t
On Thu, Aug 14, 2025 at 3:15 PM Peter Geoghegan wrote:
> Then why does the exact same pair of runs show "I/O Timings: shared
> read=194.629" for the sequential table backwards scan (with total
> execution time 1132.360 ms), versus "I/O Timings: shared read=352.88"
> (with total execution time 697.
Hi,
On 2025-08-14 15:15:02 -0400, Peter Geoghegan wrote:
> On Thu, Aug 14, 2025 at 2:53 PM Andres Freund wrote:
> > I think this is just an indicator of being IO bound.
>
> Then why does the exact same pair of runs show "I/O Timings: shared
> read=194.629" for the sequential table backwards scan
Hi,
On 2025-08-14 15:30:16 -0400, Peter Geoghegan wrote:
> On Thu Aug 14, 2025 at 3:15 PM EDT, Peter Geoghegan wrote:
> > On Thu, Aug 14, 2025 at 2:53 PM Andres Freund wrote:
> >> I think this is just an indicator of being IO bound.
> >
> > Then why does the exact same pair of runs show "I/O Timi
On Thu Aug 14, 2025 at 3:15 PM EDT, Peter Geoghegan wrote:
> On Thu, Aug 14, 2025 at 2:53 PM Andres Freund wrote:
>> I think this is just an indicator of being IO bound.
>
> Then why does the exact same pair of runs show "I/O Timings: shared
> read=194.629" for the sequential table backwards scan
On Thu, Aug 14, 2025 at 2:53 PM Andres Freund wrote:
> I think this is just an indicator of being IO bound.
Then why does the exact same pair of runs show "I/O Timings: shared
read=194.629" for the sequential table backwards scan (with total
execution time 1132.360 ms), versus "I/O Timings: share
Hi,
On 2025-08-14 14:44:44 -0400, Peter Geoghegan wrote:
> On Thu Aug 14, 2025 at 1:57 PM EDT, Peter Geoghegan wrote:
> > The only interesting thing about the flame graph is just how little
> > difference there seems to be (at least for this particular perf event
> > type).
>
> I captured method_i
On Thu Aug 14, 2025 at 1:57 PM EDT, Peter Geoghegan wrote:
> The only interesting thing about the flame graph is just how little
> difference there seems to be (at least for this particular perf event
> type).
I captured method_io_uring.c DEBUG output from running each query in the
server log, in
On Wed Aug 13, 2025 at 8:59 PM EDT, Tomas Vondra wrote:
> On 8/14/25 01:50, Peter Geoghegan wrote:
>> I first made the order of the table random, except among groups of index
>> tuples
>> that have exactly the same value. Those will still point to the same 1 or 2
>> heap
>> blocks in virtually al
On Wed Aug 13, 2025 at 7:50 PM EDT, Peter Geoghegan wrote:
> pg@regression:5432 [2476413]=# EXPLAIN (ANALYZE ,costs off, timing off)
> SELECT * FROM t WHERE a BETWEEN 16336 AND 49103 ORDER BY a desc;
> ┌─┐
> │
On Wed, Aug 13, 2025 at 8:59 PM Tomas Vondra wrote:
> I investigated this from a different angle, by tracing the I/O request
> generated. using perf-trace. And the patterns are massively different.
I tried a similar approach myself, using a variety of tools. That
didn't get me very far.
> So, Q1
On 8/14/25 01:50, Peter Geoghegan wrote:
> On Wed Aug 13, 2025 at 5:19 PM EDT, Tomas Vondra wrote:
>> I did investigate this, and I don't think there's anything broken in
>> read_stream. It happens because ReadStream has a concept of "ungetting"
>> a block, which can happen after hitting some I/
On Thu, Aug 14, 2025 at 9:19 AM Tomas Vondra wrote:
> I did investigate this, and I don't think there's anything broken in
> read_stream. It happens because ReadStream has a concept of "ungetting"
> a block, which can happen after hitting some I/O limits.
>
> In that case we "remember" the last bl
On Wed, Aug 13, 2025 at 7:51 PM Peter Geoghegan wrote:
> Apparently random I/O is twice as fast as sequential I/O in descending order!
> In
> fact, this test case creates the appearance of random I/O being at least
> slightly faster than sequential I/O for pages read in _ascending_ order!
>
> Obv
On Wed Aug 13, 2025 at 5:19 PM EDT, Tomas Vondra wrote:
> I did investigate this, and I don't think there's anything broken in
> read_stream. It happens because ReadStream has a concept of "ungetting"
> a block, which can happen after hitting some I/O limits.
>
> In that case we "remember" the last
On 8/13/25 23:36, Peter Geoghegan wrote:
> On Wed, Aug 13, 2025 at 1:01 PM Tomas Vondra wrote:
>> This seems rather bizarre, considering the two tables are exactly the
>> same, except that in t2 the first column is negative, and the rows are
>> fixed-length. Even heap_page_items says the tables ar
Hi,
On 2025-08-14 01:11:07 +0200, Tomas Vondra wrote:
> On 8/13/25 23:57, Peter Geoghegan wrote:
> > On Wed, Aug 13, 2025 at 5:19 PM Tomas Vondra wrote:
> >> It's also not very surprising this happens with backwards scans more.
> >> The I/O is apparently much slower (due to missing OS prefetch),
On 8/13/25 23:57, Peter Geoghegan wrote:
> On Wed, Aug 13, 2025 at 5:19 PM Tomas Vondra wrote:
>> It's also not very surprising this happens with backwards scans more.
>> The I/O is apparently much slower (due to missing OS prefetch), so we're
>> much more likely to hit the I/O limits (max_ios
Hi,
On 2025-08-14 00:23:49 +0200, Tomas Vondra wrote:
> On 8/13/25 23:37, Andres Freund wrote:
> > On 2025-08-13 23:07:07 +0200, Tomas Vondra wrote:
> >> On 8/13/25 16:44, Andres Freund wrote:
> >>> On 2025-08-13 14:15:37 +0200, Tomas Vondra wrote:
> In fact, I believe this is about io_method
On 8/13/25 23:37, Andres Freund wrote:
> Hi,
>
> On 2025-08-13 23:07:07 +0200, Tomas Vondra wrote:
>> On 8/13/25 16:44, Andres Freund wrote:
>>> On 2025-08-13 14:15:37 +0200, Tomas Vondra wrote:
In fact, I believe this is about io_method. I initially didn't see the
difference you desc
On Wed, Aug 13, 2025 at 5:19 PM Tomas Vondra wrote:
> It's also not very surprising this happens with backwards scans more.
> The I/O is apparently much slower (due to missing OS prefetch), so we're
> much more likely to hit the I/O limits (max_ios and various other limits
> in read_stream_start_p
Hi,
On 2025-08-13 23:07:07 +0200, Tomas Vondra wrote:
> On 8/13/25 16:44, Andres Freund wrote:
> > On 2025-08-13 14:15:37 +0200, Tomas Vondra wrote:
> >> In fact, I believe this is about io_method. I initially didn't see the
> >> difference you described, and then I realized I set io_method=sync t
On Wed, Aug 13, 2025 at 1:01 PM Tomas Vondra wrote:
> This seems rather bizarre, considering the two tables are exactly the
> same, except that in t2 the first column is negative, and the rows are
> fixed-length. Even heap_page_items says the tables are exactly the same.
>
> So why would the index
On 8/13/25 18:01, Peter Geoghegan wrote:
> On Wed, Aug 13, 2025 at 11:28 AM Andres Freund wrote:
>>> With "sync" I always get this (after a restart):
>>>
>>>Buffers: shared hit=7435 read=52801
>>>
>>> while with "worker" I get this:
>>>
>>>Buffers: shared hit=4879 read=52801
>>>Buff
On 8/13/25 16:44, Andres Freund wrote:
> Hi,
>
> On 2025-08-13 14:15:37 +0200, Tomas Vondra wrote:
>> In fact, I believe this is about io_method. I initially didn't see the
>> difference you described, and then I realized I set io_method=sync to
>> make it easier to track the block access. And
On 8/13/25 18:36, Peter Geoghegan wrote:
> On Wed, Aug 13, 2025 at 8:15 AM Tomas Vondra wrote:
>> 1) created a second table with an "inverse pattern" that's decreasing:
>>
>> create table t2 (like t) with (fillfactor = 20);
>> insert into t2 select -a, b from t;
>> create index idx2 on t2 (a
On Wed, Aug 13, 2025 at 8:15 AM Tomas Vondra wrote:
> 1) created a second table with an "inverse pattern" that's decreasing:
>
> create table t2 (like t) with (fillfactor = 20);
> insert into t2 select -a, b from t;
> create index idx2 on t2 (a);
> alter index idx2 set (deduplicate_items =
On Wed, Aug 13, 2025 at 11:28 AM Andres Freund wrote:
> > With "sync" I always get this (after a restart):
> >
> >Buffers: shared hit=7435 read=52801
> >
> > while with "worker" I get this:
> >
> >Buffers: shared hit=4879 read=52801
> >Buffers: shared hit=5151 read=52801
> >Buffers
Hi,
On 2025-08-13 14:15:37 +0200, Tomas Vondra wrote:
> In fact, I believe this is about io_method. I initially didn't see the
> difference you described, and then I realized I set io_method=sync to
> make it easier to track the block access. And if I change io_method to
> worker, I get different
On 8/13/25 01:33, Peter Geoghegan wrote:
> On Tue, Aug 12, 2025 at 7:10 PM Tomas Vondra wrote:
>> Actually, this might be a consequence of how backwards scans work (at
>> least in btree). I logged the block in index_scan_stream_read_next, and
>> this is what I see in the forward scan (at the begin
Hi,
On Tue, 12 Aug 2025 at 22:30, Thomas Munro wrote:
>
> On Tue, Aug 12, 2025 at 11:22 PM Nazir Bilal Yavuz wrote:
> > Unfortunately this doesn't work. We need to handle backwards I/O
> > combining in the StartReadBuffersImpl() function too as buffer indexes
> > won't have correct blocknums. Al
On Tue, Aug 12, 2025 at 7:10 PM Tomas Vondra wrote:
> Actually, this might be a consequence of how backwards scans work (at
> least in btree). I logged the block in index_scan_stream_read_next, and
> this is what I see in the forward scan (at the beginning):
Just to be clear: you did disable dedu
On 8/12/25 23:52, Tomas Vondra wrote:
>
> On 8/12/25 23:22, Peter Geoghegan wrote:
>> ...
>>
>> It looks like the patch does significantly better with the forwards scan,
>> compared to the backwards scan (though both are improved by a lot). But
>> that's
>> not the main thing about these resu
Hi,
On 2025-08-12 17:22:20 -0400, Peter Geoghegan wrote:
> Doesn't look like Linux will do this, if what my local testing shows is
> anything
> to go on.
Yes, matches my experiments outside of postgres too.
> I'm a bit surprised by this (I also thought that OS readahead on linux
> was quite so
On 8/12/25 23:22, Peter Geoghegan wrote:
> ...
>
> It looks like the patch does significantly better with the forwards scan,
> compared to the backwards scan (though both are improved by a lot). But
> that's
> not the main thing about these results that I find interesting.
>
> The really odd t
On Tue Aug 12, 2025 at 1:06 AM EDT, Thomas Munro wrote:
> I'd be interested to hear more about reverse scans. Bilal was
> speculating about backwards I/O combining in read_stream.c a while
> back, but we didn't have anything interesting to use it yet. You'll
> probably see a flood of uncombined 8
Hi,
On 2025-08-12 18:53:13 +0200, Tomas Vondra wrote:
> I'm running some tests looking for these weird changes, not just with
> the patches, but on master too. And I don't think b4212231 changed the
> situation very much.
>
> FWIW this issue is not caused by the index prefetching patches, I can
>
On Tue, Aug 12, 2025 at 1:51 PM Tomas Vondra wrote:
> One more detail I just noticed - the DESC scan apparently needs more
> buffers (~87k vs. 57k). That probably shouldn't cause such massive
> regression, though.
I can reproduce this.
I wondered if the difference might be attributable to the is
On Tue, Aug 12, 2025 at 11:22 PM Nazir Bilal Yavuz wrote:
> Unfortunately this doesn't work. We need to handle backwards I/O
> combining in the StartReadBuffersImpl() function too as buffer indexes
> won't have correct blocknums. Also, I think buffer forwarding of split
> backwards I/O should be h
On 8/12/25 18:53, Tomas Vondra wrote:
> ...
>
> EXPLAIN (ANALYZE, COSTS OFF)
> SELECT * FROM t WHERE a BETWEEN 16336 AND 49103 ORDER BY a ASC;
>
> QUERY PLAN
>
> Index Scan using idx on t
>
On 8/12/25 13:22, Nazir Bilal Yavuz wrote:
> Hi,
>
> On Tue, 12 Aug 2025 at 08:07, Thomas Munro wrote:
>>
>> On Tue, Aug 12, 2025 at 11:42 AM Peter Geoghegan wrote:
>>> On Mon, Aug 11, 2025 at 5:07 PM Tomas Vondra wrote:
I can do some tests with forward vs. backwards scans. Of course, the
Hi,
On Tue, 12 Aug 2025 at 08:07, Thomas Munro wrote:
>
> On Tue, Aug 12, 2025 at 11:42 AM Peter Geoghegan wrote:
> > On Mon, Aug 11, 2025 at 5:07 PM Tomas Vondra wrote:
> > > I can do some tests with forward vs. backwards scans. Of course, the
> > > trouble with finding these weird cases is th
On Tue, Aug 12, 2025 at 11:42 AM Peter Geoghegan wrote:
> On Mon, Aug 11, 2025 at 5:07 PM Tomas Vondra wrote:
> > I can do some tests with forward vs. backwards scans. Of course, the
> > trouble with finding these weird cases is that they may be fairly rare.
> > So hitting them is a matter or luc
1 - 100 of 305 matches
Mail list logo