from:"Tomas Vondra"

Re: Adding basic NUMA awareness

2025-09-20 Thread Tomas Vondra

On 9/11/25 10:32, Tomas Vondra wrote: > ... > > For example, we may get confused about the memory page size. The "size" > happens before allocation, and at that point we don't know if we succeed > in getting enough huge pages. When "init" happens, we alrea

Re: Making type Datum be 8 bytes everywhere

2025-09-20 Thread Tomas Vondra

-fno-sanitize-recover=all -latomic" \ LDFLAGS="-fsanitize=alignment -latomic" regards -- Tomas Vondra

Re: Parallel heap vacuum

2025-09-17 Thread Tomas Vondra

On 9/18/25 01:22, Andres Freund wrote: > Hi, > > On 2025-09-17 13:25:11 +0200, Tomas Vondra wrote: >> I believe the reason why parallelism is disabled in autovacuum is that >> we want autovacuum to be a background process, with minimal disruption >> to user workload. I

Re: Parallel heap vacuum

2025-09-17 Thread Tomas Vondra

On 9/18/25 01:18, Masahiko Sawada wrote: > On Wed, Sep 17, 2025 at 4:25 AM Tomas Vondra wrote: >> >> On 9/8/25 17:40, Melanie Plageman wrote: >>> On Wed, Aug 27, 2025 at 2:30 PM Masahiko Sawada >>> wrote: >>>> >>>> On Tue, Aug 26, 20

Re: Parallel heap vacuum

2025-09-17 Thread Tomas Vondra

On 9/17/25 18:32, Robert Haas wrote: > On Wed, Sep 17, 2025 at 12:23 PM Tomas Vondra wrote: >> Look at the BRIN code, for example. Most of the parallel stuff happens >> in _brin_begin_parallel. Maybe more of it could be generalized a bit >> more (some of the shmem setup?). B

Re: Parallel heap vacuum

2025-09-17 Thread Tomas Vondra

On 9/17/25 18:01, Robert Haas wrote: > On Wed, Sep 17, 2025 at 7:25 AM Tomas Vondra wrote: >> I took a quick look at the patch this week. I don't have a very strong >> opinion on the changes to table AM API, and I somewhat agree with this >> impression. It's n

Re: index prefetching

2025-09-15 Thread Tomas Vondra

On 9/15/25 17:12, Peter Geoghegan wrote: > On Mon, Sep 15, 2025 at 9:00 AM Tomas Vondra wrote: >> Yeah, this heuristics seems very effective in eliminating the regression >> (at least judging by the test results I've seen so far). Two or three >> question bother me abo

Re: Making type Datum be 8 bytes everywhere

2025-09-11 Thread Tomas Vondra

On 9/10/25 22:35, Tom Lane wrote: > Tomas Vondra writes: >> While testing a different patch, I tried running with address sanitizer >> on rpi5, running the 32-bit OS (which AFAIK is 64-bit kernel and 32-bit >> user space). With that, stats_ext regression

Re: index prefetching

2025-09-04 Thread Tomas Vondra

choose to ignore them and the regressions. The approach tends to find "adversary" cases, hit corner cases (not necessarily as rare as assumed), etc. But the issues we ran into so far seem perfectly valid (or at least useful to think about). regards -- Tomas Vondra

Re: Should io_method=worker remain the default?

2025-09-03 Thread Tomas Vondra

case it makes sense, because the reads are random enough to prevent I/O combining. But for a sequential workload I'd expect I/O combining to help. Could it be that it ends up evicting buffers randomly, which (I guess) might interfere with the combining? What's shared_buffers set to? Have you watched how large the I/O requests are? iostat, iosnoop or strace would tell you. regards -- Tomas Vondra

Re: Changing the state of data checksums in a running cluster

2025-09-01 Thread Tomas Vondra

On 8/29/25 16:38, Tomas Vondra wrote: > On 8/29/25 16:26, Tomas Vondra wrote: >> ... >> >> I've seen these failures after changing checksums in both directions, >> both after enabling and disabling. But I've only ever saw this after >> immediate

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-08-29 Thread Tomas Vondra

On 8/29/25 21:03, Peter Geoghegan wrote: > On Fri, Aug 29, 2025 at 9:10 AM Tomas Vondra wrote: >> Peter, any thoughts on this. Do you think it's reasonable / feasible to >> push the fix? > > I don't feel comfortable pushing that fix today. > Understood. >

Re: Changing the state of data checksums in a running cluster

2025-08-29 Thread Tomas Vondra

On 8/29/25 16:26, Tomas Vondra wrote: > ... > > I've seen these failures after changing checksums in both directions, > both after enabling and disabling. But I've only ever saw this after > immediate shutdown, never after fast shutdown. (It's interesting the > pg

Re: Changing the state of data checksums in a running cluster

2025-08-29 Thread Tomas Vondra

On 8/27/25 14:42, Tomas Vondra wrote: > On 8/27/25 14:39, Tomas Vondra wrote: >> ... >> >> And this happened on Friday: >> >> commit c13070a27b63d9ce4850d88a63bf889a6fde26f0 >> Author: Alexander Korotkov >> Date: Fri Aug 22 18:44:39 2025 +0300 &

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-08-29 Thread Tomas Vondra

of index access > methods' handler_function output to const static, from dynamic in > memctx. > IIRC both approaches address the issue. I'd go with Peter's patch for 18. The other patch is much more invasive / bigger, and we're right before RC1 freeze. Maybe it's a good idea, but I'd say it's for 19. Peter, any thoughts on this. Do you think it's reasonable / feasible to push the fix? regards -- Tomas Vondra

Re: index prefetching

2025-08-28 Thread Tomas Vondra

On 8/29/25 01:57, Peter Geoghegan wrote: > On Thu, Aug 28, 2025 at 7:52 PM Tomas Vondra wrote: >> Use this branch: >> >> https://github.com/tvondra/postgres/commits/index-prefetch-master/ >> >> and then Thomas' patch that increases the prefetch distanc

Re: index prefetching

2025-08-28 Thread Tomas Vondra

On 8/29/25 01:27, Andres Freund wrote: > Hi, > > On 2025-08-29 01:00:58 +0200, Tomas Vondra wrote: >> I'm not sure how to determine what concurrency it "wants". All I know is >> that for "warm" runs [1], the basic index prefetch patch uses dista

Re: index prefetching

2025-08-28 Thread Tomas Vondra

On 8/28/25 21:52, Andres Freund wrote: > Hi, > > On 2025-08-28 19:08:40 +0200, Tomas Vondra wrote: >> On 8/28/25 18:16, Andres Freund wrote: >>>> So I think the IPC overhead with "worker" can be quite significant, >>>> especially for cases

Re: index prefetching

2025-08-28 Thread Tomas Vondra

On 8/28/25 23:50, Thomas Munro wrote: > On Fri, Aug 29, 2025 at 7:52 AM Andres Freund wrote: >> On 2025-08-28 19:08:40 +0200, Tomas Vondra wrote: >>> From the 2x regression (compared to master) it might seem like that, but >>> even with the increased distance it

Re: index prefetching

2025-08-28 Thread Tomas Vondra

On 8/28/25 18:16, Andres Freund wrote: > Hi, > > On 2025-08-28 14:45:24 +0200, Tomas Vondra wrote: >> On 8/26/25 17:06, Tomas Vondra wrote: >> I kept thinking about this, and in the end I decided to try to measure >> this IPC overhead. The backend/ioworker communicate

Re: Changing the state of data checksums in a running cluster

2025-08-28 Thread Tomas Vondra

even without primary shutdown. But the standby "fast" shutdown is always there. But this also shows a limitation of the TAP test - it never triggers the shutdowns while flipping the checksums (in flip_data_checksums). I think that's something worth testing. regards -- Toma

Re: index prefetching

2025-08-28 Thread Tomas Vondra

On 8/26/25 17:06, Tomas Vondra wrote: > > > On 8/26/25 01:48, Andres Freund wrote: >> Hi, >> >> On 2025-08-25 15:00:39 +0200, Tomas Vondra wrote: >>> >>> ... >>> >>> I'm not sure what's causing this, but almost all regr

Re: Changing the state of data checksums in a running cluster

2025-08-27 Thread Tomas Vondra

On 8/27/25 14:39, Tomas Vondra wrote: > ... > > And this happened on Friday: > > commit c13070a27b63d9ce4850d88a63bf889a6fde26f0 > Author: Alexander Korotkov > Date: Fri Aug 22 18:44:39 2025 +0300 > > Revert "Get rid of WALBufMappingLo

Re: Changing the state of data checksums in a running cluster

2025-08-27 Thread Tomas Vondra

On 8/27/25 13:00, Daniel Gustafsson wrote: >> On 27 Aug 2025, at 11:39, Tomas Vondra wrote: > >> Just to be clear - I don't see any pg_checksums failures either. I only >> see failures in the standby log, and I don't think the script checks >> that (it prob

Re: Changing the state of data checksums in a running cluster

2025-08-27 Thread Tomas Vondra

On 8/27/25 10:30, Daniel Gustafsson wrote: >> On 26 Aug 2025, at 01:06, Tomas Vondra wrote: > >> I think this TAP looks very nice, but there's a couple issues with it. >> See the attached patch fixing those. > > Thanks, I have incorporated (most of) your pa

Re: index prefetching

2025-08-26 Thread Tomas Vondra

On 8/26/25 01:48, Andres Freund wrote: > Hi, > > On 2025-08-25 15:00:39 +0200, Tomas Vondra wrote: >> Thanks. Based on the testing so far, the patch seems to be a substantial >> improvement. What's needed to make this prototype committable? > > Mainly some

Re: index prefetching

2025-08-26 Thread Tomas Vondra

On 8/26/25 03:08, Peter Geoghegan wrote: > On Mon Aug 25, 2025 at 10:18 AM EDT, Tomas Vondra wrote: >> The attached patch is a PoC implementing this. The core idea is that if >> we measure "miss probability" for a chunk of requests, we can use that >> to estimate

Re: Changing the state of data checksums in a running cluster

2025-08-25 Thread Tomas Vondra

On 8/25/25 20:32, Daniel Gustafsson wrote: >> On 20 Aug 2025, at 16:37, Tomas Vondra wrote: > >> This happens quite regularly, it's not hard to hit. But I've only seen >> it to happen on a FSM, and only right after immediate shutdown. I don't >> think

Re: index prefetching

2025-08-25 Thread Tomas Vondra

On 8/25/25 19:57, Peter Geoghegan wrote: > On Mon, Aug 25, 2025 at 10:18 AM Tomas Vondra wrote: >> Almost all regressions (at least the top ones) now look like this, i.e. >> distance collapses to ~2.0, which essentially disables prefetching. > > Good to know. > >&

Re: index prefetching

2025-08-25 Thread Tomas Vondra

On 8/25/25 17:43, Thomas Munro wrote: > On Tue, Aug 26, 2025 at 2:18 AM Tomas Vondra wrote: >> Of course, this can happen even with other hit ratios, there's nothing >> special about 50%. > > Right, that's what this patch was attacking directly, basically only

Re: index prefetching

2025-08-25 Thread Tomas Vondra

On 8/25/25 16:18, Tomas Vondra wrote: > ... > > But with more hits, the hit/miss ratio simply determines the "stable" > distance. Let's say there's 80% hits, so 4 hits to 1 miss. Then the > stable distance is ~4, because we get a miss, double to 8, and then 4

Re: index prefetching

2025-08-25 Thread Tomas Vondra

) So it's more a case of "mitigating a regression" (finding regressions like this is the purpose of my script). Still, I believe the questions about the distance heuristics are valid. (Another interesting detail is that the regression happens only with io_method=worker, not with io_urin

Re: index prefetching

2025-08-25 Thread Tomas Vondra

437)) Index Searches: 1 Prefetch Distance: 2.032 Prefetch Count: 868165 Prefetch Stalls: 2140228 Prefetch Skips: 6039906 Prefetch Resets: 0 Stream Ungets: 0 Stream Forwarded: 4 Prefetch Histogram: [2,4) => 855753, [4,8) => 12412 Buffers: shar

Re: Changing the state of data checksums in a running cluster

2025-08-20 Thread Tomas Vondra

e that as "off", i.e. error out. regards -- Tomas Vondra

Re: Changing the state of data checksums in a running cluster

2025-08-20 Thread Tomas Vondra

tgresql.org/message-id/f528413c-477a-4ec3-a0df-e22a80ffb...@vondra.me -- Tomas Vondra

Re: index prefetching

2025-08-19 Thread Tomas Vondra

imal to only initialize read_stream after reading the next batch. For some indexes a batch can have hundreds of items, and that certainly could benefit from prefetching. I suppose it should be possible to initialize the read_stream half-way though a batch, right? Or is there a reason why that can't work? regards [1] https://github.com/tvondra/postgres/tree/index-prefetch-master/query-stress-test -- Tomas Vondra

Re: Enable data checksums by default

2025-08-19 Thread Tomas Vondra

On 7/29/25 20:24, Tomas Vondra wrote: > Hi! > > So, what should we do with the PG18 open item? We (the RMT team) would > like to know if we shall keep the checksums enabled by default, and if > there's something that still needs to be done for PG18. > > We don't h

Re: VM corruption on standby

2025-08-19 Thread Tomas Vondra

2dc0e0d. regards -- Tomas Vondra

Re: index prefetching

2025-08-14 Thread Tomas Vondra

On 8/15/25 01:05, Peter Geoghegan wrote: > On Thu, Aug 14, 2025 at 6:24 PM Tomas Vondra wrote: >> FWIW I'm not claiming this explains all odd things we're investigating >> in this thread, it's more a confirmation that the scan direction may >> matter if it t

Re: index prefetching

2025-08-14 Thread Tomas Vondra

cate. It might > make sense to at least place that much of the burden on the > callback/client side. > I don't recall all the details, but IIRC my impression was it'd be best to do this "caching" entirely in the read_stream.c (so the next_block callbacks would probably not need to worry about lastBlock at all), enabled when creating the stream. And then there would be something like read_stream_release_buffer() that'd do the right to release the buffer when it's not needed. regards -- Tomas Vondra

Re: index prefetching

2025-08-14 Thread Tomas Vondra

On 8/14/25 01:19, Andres Freund wrote: > Hi, > > On 2025-08-14 01:11:07 +0200, Tomas Vondra wrote: >> On 8/13/25 23:57, Peter Geoghegan wrote: >>> On Wed, Aug 13, 2025 at 5:19 PM Tomas Vondra wrote: >>>> It's also not very surprising this happens w

Re: index prefetching

2025-08-13 Thread Tomas Vondra

On 8/14/25 01:50, Peter Geoghegan wrote: > On Wed Aug 13, 2025 at 5:19 PM EDT, Tomas Vondra wrote: >> I did investigate this, and I don't think there's anything broken in >> read_stream. It happens because ReadStream has a concept of "ungetting" >> a blo

Re: index prefetching

2025-08-13 Thread Tomas Vondra

On 8/13/25 23:36, Peter Geoghegan wrote: > On Wed, Aug 13, 2025 at 1:01 PM Tomas Vondra wrote: >> This seems rather bizarre, considering the two tables are exactly the >> same, except that in t2 the first column is negative, and the rows are >> fixed-length. Even heap_page_

Re: index prefetching

2025-08-13 Thread Tomas Vondra

On 8/13/25 23:57, Peter Geoghegan wrote: > On Wed, Aug 13, 2025 at 5:19 PM Tomas Vondra wrote: >> It's also not very surprising this happens with backwards scans more. >> The I/O is apparently much slower (due to missing OS prefetch), so we're >> much more likely

Re: index prefetching

2025-08-13 Thread Tomas Vondra

On 8/13/25 23:37, Andres Freund wrote: > Hi, > > On 2025-08-13 23:07:07 +0200, Tomas Vondra wrote: >> On 8/13/25 16:44, Andres Freund wrote: >>> On 2025-08-13 14:15:37 +0200, Tomas Vondra wrote: >>>> In fact, I believe this is about io_method. I initiall

Re: index prefetching

2025-08-13 Thread Tomas Vondra

again. It may seem as if read_stream_get_block() produced the same block twice, but it's really just the block from the last round. All duplicates produced by read_stream_look_ahead were caused by this. I suspected it's a bug in lastBlock optimization, but that's not the case, it happens entirely within read_stream. And it's expected. It's also not very surprising this happens with backwards scans more. The I/O is apparently much slower (due to missing OS prefetch), so we're much more likely to hit the I/O limits (max_ios and various other limits in read_stream_start_pending_read). regards -- Tomas Vondra

Re: index prefetching

2025-08-13 Thread Tomas Vondra

On 8/13/25 16:44, Andres Freund wrote: > Hi, > > On 2025-08-13 14:15:37 +0200, Tomas Vondra wrote: >> In fact, I believe this is about io_method. I initially didn't see the >> difference you described, and then I realized I set io_method=sync to >> make it easie

Re: index prefetching

2025-08-13 Thread Tomas Vondra

On 8/13/25 18:36, Peter Geoghegan wrote: > On Wed, Aug 13, 2025 at 8:15 AM Tomas Vondra wrote: >> 1) created a second table with an "inverse pattern" that's decreasing: >> >> create table t2 (like t) with (fillfactor = 20); >> insert into t2 select

Re: Adding basic NUMA awareness

2025-08-13 Thread Tomas Vondra

On 8/13/25 17:16, Andres Freund wrote: > Hi, > > On 2025-08-07 11:24:18 +0200, Tomas Vondra wrote: >> The patch does a much simpler thing - treat the weight as a "budget", >> i.e. number of buffers to allocate before proceeding to the "next" >> part

Re: index prefetching

2025-08-13 Thread Tomas Vondra

On 8/13/25 01:33, Peter Geoghegan wrote: > On Tue, Aug 12, 2025 at 7:10 PM Tomas Vondra wrote: >> Actually, this might be a consequence of how backwards scans work (at >> least in btree). I logged the block in index_scan_stream_read_next, and >> this is what I see in the

Re: index prefetching

2025-08-12 Thread Tomas Vondra

On 8/12/25 23:52, Tomas Vondra wrote: > > On 8/12/25 23:22, Peter Geoghegan wrote: >> ... >> >> It looks like the patch does significantly better with the forwards scan, >> compared to the backwards scan (though both are improved by a lot). But >> that

Re: index prefetching

2025-08-12 Thread Tomas Vondra

ise, since it looks > like > OS readahead remains a big factor with direct I/O. Did I just miss something > obvious? > I don't think you missed anything. It does seem the assumption relies on the OS handling the underlying I/O patterns equally, and unfortunately that does not seem to be the case. Maybe we could "invert" the data set, i.e. make it "descending" instead of "ascending"? That would make the heap access direction "forward" again ... regards -- Tomas Vondra

Re: index prefetching

2025-08-12 Thread Tomas Vondra

On 8/12/25 18:53, Tomas Vondra wrote: > ... > > EXPLAIN (ANALYZE, COSTS OFF) > SELECT * FROM t WHERE a BETWEEN 16336 AND 49103 ORDER BY a ASC; > > QUERY PLAN > > I

Re: index prefetching

2025-08-12 Thread Tomas Vondra

On 8/12/25 13:22, Nazir Bilal Yavuz wrote: > Hi, > > On Tue, 12 Aug 2025 at 08:07, Thomas Munro wrote: >> >> On Tue, Aug 12, 2025 at 11:42 AM Peter Geoghegan wrote: >>> On Mon, Aug 11, 2025 at 5:07 PM Tomas Vondra wrote: >>>> I can do some tests with f

Re: Adding basic NUMA awareness

2025-08-12 Thread Tomas Vondra

On 8/12/25 16:24, Andres Freund wrote: > Hi, > > On 2025-08-12 13:04:07 +0200, Tomas Vondra wrote: >> Right. I don't think the current patch would crash - I can't test it, >> but I don't see why it would crash. In the worst case it'd end up with >&

Re: Adding basic NUMA awareness

2025-08-12 Thread Tomas Vondra

On 8/9/25 02:25, Andres Freund wrote: > Hi, > > On 2025-08-07 11:24:18 +0200, Tomas Vondra wrote: >> 2) I'm a bit unsure what "NUMA nodes" actually means. The patch mostly >> assumes each core / piece of RAM is assigned to a particular NUMA node. > >

Re: index prefetching

2025-08-11 Thread Tomas Vondra

On 8/11/25 22:14, Peter Geoghegan wrote: > On Mon, Aug 11, 2025 at 10:16 AM Tomas Vondra wrote: >> Perhaps. For me benchmarks are a way to learn about stuff and better >> understand the pros/cons of approaches. It's possible some of the >> changes will impact the chara

Re: index prefetching

2025-08-11 Thread Tomas Vondra

On 8/9/25 01:47, Andres Freund wrote: > Hi, > > On 2025-08-06 16:12:53 +0200, Tomas Vondra wrote: >> That's quite possible. What concerns me about using tables like pgbench >> accounts table is reproducibility - initially it's correlated, and then >> it g

Re: Adding basic NUMA awareness

2025-08-07 Thread Tomas Vondra

On 8/7/25 11:24, Tomas Vondra wrote: > Hi! > > Here's a slightly improved version of the patch series. > Ah, I made a mistake when generating the patches. The 0001 and 0002 patches are not part of the NUMA stuff, it's just something related to benchmarking (addressing unr

Re: index prefetching

2025-08-06 Thread Tomas Vondra

On 8/5/25 23:35, Peter Geoghegan wrote: > On Tue, Aug 5, 2025 at 4:56 PM Tomas Vondra wrote: >> Probably. It was hard to predict which values will be interesting, maybe >> we can pick some subset now. I'll start by just doing larger steps, I >> think. Maybe increase by

Re: Bug in brin_minmax_multi_distance_numeric()

2025-08-06 Thread Tomas Vondra

On 8/5/25 22:17, Tom Lane wrote: > Tomas Vondra writes: >> On 8/5/25 20:11, Tom Lane wrote: >>> Yes, I think it ought to be committed/backpatched separately. >>> I was expecting Tomas to do that, but I can if he's busy ... > >> Sorry, I didn't realiz

Re: index prefetching

2025-08-05 Thread Tomas Vondra

On 8/5/25 19:19, Peter Geoghegan wrote: > On Tue, Aug 5, 2025 at 10:52 AM Tomas Vondra wrote: >> I ran some more tests, comparing the two patches, using data sets >> generated in a way to have a more gradual transition between correlated >> and random cases. > >

Re: Bug in brin_minmax_multi_distance_numeric()

2025-08-05 Thread Tomas Vondra

as expecting Tomas to do that, but I can if he's busy ... > Sorry, I didn't realize that - it seemed you're handling this. I can take care of this in the next couple days, if still needed. regards -- Tomas Vondra

Re: Bug in brin_minmax_multi_distance_numeric()

2025-08-01 Thread Tomas Vondra

On 7/31/25 20:35, Tom Lane wrote: > Tomas Vondra writes: >> On 7/31/25 19:33, Tom Lane wrote: >>> ... It is certainly broken on >>> 32-bit machines where the Datum result of numeric_float8 will >>> be a pointer, so that we will convert the numeric pointer

Re: Bug in brin_minmax_multi_distance_numeric()

2025-07-31 Thread Tomas Vondra

ng incorrect results". The distance functions determine in what order we merge points into ranges, and if the distances are bogus then we can build a summary that is less efficient. regards -- Tomas Vondra

Re: Enable data checksums by default

2025-07-31 Thread Tomas Vondra

On 7/31/25 15:39, Greg Burd wrote: > > >> On Jul 30, 2025, at 8:09 AM, Daniel Gustafsson wrote: >> >>> On 30 Jul 2025, at 11:58, Laurenz Albe wrote: >>> >>> On Tue, 2025-07-29 at 20:24 +0200, Tomas Vondra wrote: >>>> So, what shou

Re: Fix tab completion in v18 for ALTER DATABASE/USER/ROLE ... RESET

2025-07-31 Thread Tomas Vondra

"CONSTRAINTS", >> >> "TRANSACTION", > > Instead of adding another !TailMatches() call, why not just change > "DATABASE" to "DATABASE|ROLE|USER"? It seemed to me separate calls would be easier to understand, but I see combine it like this in many other places, so done that way ... Pushed. Thanks for the fixes! regards -- Tomas Vondra

Re: Adding basic NUMA awareness

2025-07-30 Thread Tomas Vondra

On 7/30/25 10:29, Jakub Wartak wrote: > On Mon, Jul 28, 2025 at 4:22 PM Tomas Vondra wrote: > > Hi Tomas, > > just a quick look here: > >> 2) The PGPROC part introduces a similar registry, [..] >> >> There's also a view pg_buffercache_pgproc. The pg_bu

Re: Reduce "Var IS [NOT] NULL" quals during constant folding

2025-07-29 Thread Tomas Vondra

rds [1] https://www.postgresql.org/message-id/602561.1744314879%40sss.pgh.pa.us [2] https://www.postgresql.org/message-id/1514756.1747925490%40sss.pgh.pa.us -- Tomas Vondra

Re: Enable data checksums by default

2025-07-29 Thread Tomas Vondra

41ff1 [2] https://www.postgresql.org/message-id/brdaw5wke274lubirrl4v2k4qdacylvgwwqztifn7m27pkth3s%40rh7wie47pfcp [3] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e6eed40e44419e3268d01fe0d2daec08a7df68f7 -- Tomas Vondra

Re: Fix tab completion in v18 for ALTER DATABASE/USER/ROLE ... RESET

2025-07-29 Thread Tomas Vondra

e role, it was offering all matching variables anyway. I believe that's because of the block at line ~5022. The "database" case was already excluded, so I made 0002 to do that for ROLE too. I plan to push the attached fixes soon ... regards -- Tomas Vondra From fa26b62298d7a4221d9bc

Re: should we have a fast-path planning for OLTP starjoins?

2025-07-28 Thread Tomas Vondra

On 2/4/25 22:55, Tom Lane wrote: > Tomas Vondra writes: >>> The interesting thing about this is we pretty much have all the >>> infrastructure for detecting such FK-related join conditions >>> already. Possibly the join order forcing could be done with >&

Re: PoC: adding CustomJoin, separate from CustomScan

2025-07-25 Thread Tomas Vondra

are simply part of the regular join search. We generate all the various paths for a joinrel, and then give the set_join_pathlist_hook hook a chance to add some more. AFAIK it doesn't affect the join order search, or anything like that. At least not directly. regards -- Tomas Vondra

Re: Adding basic NUMA awareness

2025-07-25 Thread Tomas Vondra

On 7/25/25 12:27, Jakub Wartak wrote: > On Thu, Jul 17, 2025 at 11:15 PM Tomas Vondra wrote: >> >> On 7/4/25 20:12, Tomas Vondra wrote: >>> On 7/4/25 13:05, Jakub Wartak wrote: >>>> ... >>>> >>>> 8. v1-0005 2x + /* if (numa_procs_inte

Re: index prefetching

2025-07-24 Thread Tomas Vondra

On 7/24/25 16:40, Peter Geoghegan wrote: > On Thu, Jul 24, 2025 at 7:19 AM Tomas Vondra wrote: >> I got a bit bored yesterday, so I gave this a try and whipped up a patch >> that adds two pgstattuple functins that I think could be useful for >> analyzing index metrics that m

Re: PoC: adding CustomJoin, separate from CustomScan

2025-07-24 Thread Tomas Vondra

On 7/24/25 15:57, Robert Haas wrote: > On Thu, Jul 24, 2025 at 9:04 AM Tomas Vondra wrote: >> With this patch, my custom join can simply do >> >> econtext->ecxt_outertuple = outer; >> econtext->ecxt_innertuple = inner; >> >> return ExecP

PoC: adding CustomJoin, separate from CustomScan

2025-07-24 Thread Tomas Vondra

ow. Note: I mentioned some extensions implementing SmoothScan/G-join. I plan to publish those once I polish that a bit more. It's more a research rather than something ready to use right now. regards [1] https://scholar.harvard.edu/files/stratos/files/smooth_vldbj.pdf [2] https://dl.g

Re: index prefetching

2025-07-24 Thread Tomas Vondra

On 7/23/25 02:37, Tomas Vondra wrote: > ... > >>> Thanks. I wonder how difficult would it be to add something like this to >>> pgstattuple. I mean, it shouldn't be difficult to look at leaf pages and >>> count distinct blocks, right? Seems quite useful.

Re: index prefetching

2025-07-23 Thread Tomas Vondra

On 7/23/25 17:09, Andres Freund wrote: > Hi, > > On 2025-07-23 14:50:15 +0200, Tomas Vondra wrote: >> On 7/23/25 02:59, Andres Freund wrote: >>> Hi, >>> >>> On 2025-07-23 02:50:04 +0200, Tomas Vondra wrote: >>>> But I don't see why woul

Re: index prefetching

2025-07-23 Thread Tomas Vondra

On 7/23/25 02:59, Andres Freund wrote: > Hi, > > On 2025-07-23 02:50:04 +0200, Tomas Vondra wrote: >> But I don't see why would this have any effect on the prefetch distance, >> queue depth etc. Or why decreasing INDEX_SCAN_MAX_BATCHES should improve >> tha

Re: index prefetching

2025-07-23 Thread Tomas Vondra

On 7/23/25 03:31, Peter Geoghegan wrote: > On Tue, Jul 22, 2025 at 8:37 PM Tomas Vondra wrote: >>> I happen to think that that's a very unrealistic assumption. Most >>> standard benchmarks have indexes that almost all look fairly similar >>> to pgbench_accou

Re: index prefetching

2025-07-22 Thread Tomas Vondra

On 7/23/25 02:59, Andres Freund wrote: > Hi, > > On 2025-07-23 02:50:04 +0200, Tomas Vondra wrote: >> But I don't see why would this have any effect on the prefetch distance, >> queue depth etc. Or why decreasing INDEX_SCAN_MAX_BATCHES should improve >> tha

Re: index prefetching

2025-07-22 Thread Tomas Vondra

processing a page takes much more time. Because it reads the page, and passes it to other operators in the query plan, some of which may do CPU stuff, some will trigger some synchronous I/O, etc. Which means T1 grows, and the "minimal" queue depth decreases. Which part of this is not quite right? -- Tomas Vondra

Re: index prefetching

2025-07-22 Thread Tomas Vondra

ically just yet. > I think I mostly picked a value high enough to make it unlikely to hit it in realistic cases, while also not using too much memory, and 64 seemed like a good value. But I don't see why would this have any effect on the prefetch distance, queue depth etc. Or why decreasing INDEX_SCAN_MAX_BATCHES should improve that. I'd have expected exactly the opposite behavior. Could be bug, of course. But it'd be helpful to see the dataset/query. regards -- Tomas Vondra

Re: index prefetching

2025-07-22 Thread Tomas Vondra

On 7/22/25 23:35, Peter Geoghegan wrote: > On Tue, Jul 22, 2025 at 4:50 PM Tomas Vondra wrote: >>> Obviously, whatever advantage that the "complex" patch has is bound to >>> be limited to cases where index characteristics are naturally the >>>

Re: index prefetching

2025-07-22 Thread Tomas Vondra

On 7/22/25 19:35, Peter Geoghegan wrote: > On Tue, Jul 22, 2025 at 9:06 AM Tomas Vondra wrote: >> Real workloads are likely to have multiple misses in a row, which indeed >> ramps up the distance quickly. So maybe it's not that bad. Could we >> track a longer history of

Re: index prefetching

2025-07-22 Thread Tomas Vondra

d look-ahead distance better in cases like that. Needs more > exploration... thoughts/ideas welcome... Thanks! I'll rerun the tests with these patches once the current round of tests (with the simple distance restore after a reset) completes. -- Tomas Vondra

Re: index prefetching

2025-07-19 Thread Tomas Vondra

On 7/19/25 06:03, Thomas Munro wrote: > On Sat, Jul 19, 2025 at 6:31 AM Tomas Vondra wrote: >> Perhaps the ReadStream should do something like this? Of course, the >> simple patch resets the stream very often, likely mcuh more often than >> anything else in the code. But woul

Re: Adding basic NUMA awareness

2025-07-18 Thread Tomas Vondra

On 7/18/25 18:46, Andres Freund wrote: > Hi, > > On 2025-07-17 23:11:16 +0200, Tomas Vondra wrote: >> Here's a v2 of the patch series, with a couple changes: > > Not a deep look at the code, just a quick reply. > > >> * I changed the freelist partitio

Re: index prefetching

2025-07-18 Thread Tomas Vondra

ps the ReadStream should do something like this? Of course, the simple patch resets the stream very often, likely mcuh more often than anything else in the code. But wouldn't it be beneficial for streams reset because of a rescan? Possibly needs to be optional. regards -- Tomas Vondra From

Re: index prefetching

2025-07-18 Thread Tomas Vondra

_stream_reset(). (Will share results from a couple experiments in a separate message later.) This is the context of the benchmarks I've been sharing - me trying to understand the practical implications/limits of the simple approach. Not an attempt to somehow prove it's better, or anything like that. I'm not opposed to continuing work on the "complex" approach, but as I said, I'm sure I can't pull that off on my own. With your help, I think the chance of success would be considerably higher. Does this clarify how I think about the complex patch? regards [1] https://www.postgresql.org/message-id/32c15a30-6e25-4f6d-9191-76a19482c556%40vondra.me -- Tomas Vondra

Re: Adding basic NUMA awareness

2025-07-17 Thread Tomas Vondra

On 7/4/25 20:12, Tomas Vondra wrote: > On 7/4/25 13:05, Jakub Wartak wrote: >> ... >> >> 8. v1-0005 2x + /* if (numa_procs_interleave) */ >> >>Ha! it's a TRAP! I've uncommented it because I wanted to try it out >> without it (just

Re: index prefetching

2025-07-16 Thread Tomas Vondra

On 7/16/25 19:56, Tomas Vondra wrote: > On 7/16/25 18:39, Peter Geoghegan wrote: >> On Wed, Jul 16, 2025 at 11:29 AM Peter Geoghegan wrote: >>> For example, with "linear_10 / eic=16 / sync", it looks like "complex" >>> has about half the latency o

Re: index prefetching

2025-07-16 Thread Tomas Vondra

On 7/16/25 20:18, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 1:42 PM Tomas Vondra wrote: >> On 7/16/25 16:45, Peter Geoghegan wrote: >>> I get that index characteristics could be the limiting factor, >>> especially in a world where we're not yet eagerly

Re: index prefetching

2025-07-16 Thread Tomas Vondra

ions. If you copy the first couple lines, you'll get scans.db, with nice column names and all that.. The selectivity is calculated as (rows / total_rows) where rows is the rowcount returned by the query, and total_rows is reltuples. I also had charts with "page selectivity", but that often got a bunch of 100% points squashed on the right edge, so I stopped generating those. regards -- Tomas Vondra

Re: index prefetching

2025-07-16 Thread Tomas Vondra

On 7/16/25 17:29, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 4:40 AM Tomas Vondra wrote: >> For "uniform" data set, both prefetch patches do much better than master >> (for low selectivities it's clearer in the log-scale chart). The >> "complex&qu

Re: index prefetching

2025-07-16 Thread Tomas Vondra

On 7/16/25 16:45, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 10:37 AM Tomas Vondra wrote: >> What sounds weird? That the read_stream works like a stream of blocks, >> or that it can't do "pause" and we use "reset" as a workaround? > > The fact

Re: index prefetching

2025-07-16 Thread Tomas Vondra

On 7/16/25 16:29, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 10:20 AM Tomas Vondra wrote: >> The read stream can only return blocks generated by the "next" callback. >> When we return the block for the last item on a leaf page, we can only >> return "Inva

Re: index prefetching

2025-07-16 Thread Tomas Vondra

On 7/16/25 16:07, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 9:58 AM Tomas Vondra wrote: >>> The "simple" patch has _bt_readpage reset the read stream. That >>> doesn't make any sense to me. Though it does explain why the "complex" >>>

Re: index prefetching

2025-07-16 Thread Tomas Vondra

On 7/16/25 15:36, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 4:40 AM Tomas Vondra wrote: >> But the thing I don't really understand it the "cyclic" dataset (for >> example). And the "simple" patch performs really badly here. This data >> set

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1913 matches

Mail list logo