Re: Amcheck verification of GiST and GIN

2025-06-09 Thread Tomas Vondra
On 6/9/25 00:14, Tomas Vondra wrote: > ... > > I propose to split it like this, into three parts, each addressing a > particular type of mistake: > > 1) gin_check_posting_tree_parent_keys_consistency > > 2) gin_check_parent_keys_consistency

Re: amcheck support for BRIN indexes

2025-06-08 Thread Tomas Vondra
hink these tests are > portable. While writing tests some minor issues were found and fixed. > Also ci compiler warnings were fixed. > Thanks. I've added myself as a reviewer, so that I don't forget about this for the next CF. regards -- Tomas Vondra

Re: Amcheck verification of GiST and GIN

2025-06-08 Thread Tomas Vondra
On 5/29/25 13:53, Arseniy Mukhin wrote: > On Mon, May 26, 2025 at 7:28 PM Arseniy Mukhin > wrote: >> On Mon, May 26, 2025 at 1:27 PM Tomas Vondra wrote: >>> Also, I've noticed that the TAP test passes even with some (most) of the >>> verify_gin.c changes rever

Re: [WIP]Vertical Clustered Index (columnar store extension) - take2

2025-06-04 Thread Tomas Vondra
On 6/4/25 19:59, Jim Nasby wrote: > > > On Fri, May 23, 2025 at 4:29 PM Tomas Vondra <mailto:to...@vondra.me>> wrote: > > Also, Alvaro seemed to think TAM is the way to go, and in order to keep > the OLTP performance he suggested to use both heap and VCI

Re: strange perf regression with data checksums

2025-06-04 Thread Tomas Vondra
son, and treated it as "normal". But with the default changes, it'll be easier to spot once they upgrade to PG18. So better to get this in now, otherwise we may have to wait until PG19, because of ABI (the patch adds a field into BTScanPosData, but maybe it'd be possible to add it into padding, not sure). regards -- Tomas Vondra

Re: [PING] fallocate() causes btrfs to never compress postgresql files

2025-05-31 Thread Tomas Vondra
ow which ones to set, a lot of the knowledge is somewhat outdated I think. Wouldn't it be better for btrfs to just start returning EOPNOTSUPP (maybe with a mount option), in which case we already do the right thing automatically already? Sure, it means the admin needs to be aware of this in both cases. regards -- Tomas Vondra

Re: [PING] fallocate() causes btrfs to never compress postgresql files

2025-05-28 Thread Tomas Vondra
efully will > not affect postgres (see CAVEATS in man 3 posix_fallocate). > Well, if btrfs starts returning EOPNOTSUPP, and glibc switches to the userspace fallback, we wouldn't notice. But that's up to the btrfs to decide if they want to support fallocate. We still need our fallback anyway, because of other OSes. regards -- Tomas Vondra

Re: Non-reproducible AIO failure

2025-05-27 Thread Tomas Vondra
u run these tests in parallel. Can you share the patch/script? thank -- Tomas Vondra

Re: Amcheck verification of GiST and GIN

2025-05-26 Thread Tomas Vondra
e the TAP test to trigger this too? To show the current code (in master) misses this? Grigory, Andrey, Heikki, any opinions on the tweaks? regards -- Tomas Vondra From 973de3eaeeca7ff2946a5b0f92f481d70ba5b78d Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Mon, 26 May 2025 12:10:37 +0200 Subje

Re: Hash table scans outside transactions

2025-05-25 Thread Tomas Vondra
that break the seqscan? FWIW I think with the use case from the beginning of this thread: 1. Add/update/remove entries in hash table 2. Scan the existing entries and perform one transaction per entry 3. Close scan Why not to simply build a linked list after step (1)? regards -- Tomas Vondra

Re: [WIP]Vertical Clustered Index (columnar store extension) - take2

2025-05-23 Thread Tomas Vondra
r.c:115:28: error: assignment to ‘ExecutorStart_hook_type’ {aka ‘void (*)(QueryDesc *, int)’} from incompatible pointer type ‘_Bool (*)(QueryDesc *, int)’ [-Wincompatible-pointer-types] 115 | ExecutorStart_hook = vci_executor_start_routine; |^ executor/vci_executor.c: In function ‘vci_executor_start_routine’: executor/vci_executor.c:161:28: error: void value not ignored as it ought to be 161 | plan_valid = executor_start_prev(queryDesc, eflags); |^ executor/vci_executor.c:163:28: error: void value not ignored as it ought to be 163 | plan_valid = standard_ExecutorStart(queryDesc, eflags); |^ make: *** [../../src/Makefile.global:973: executor/vci_executor.o] Error 1 The extension is not added to contrib/Makefile, so "make world" does not trigger this failure. regards -- Tomas Vondra

Re: Enable data checksums by default

2025-05-23 Thread Tomas Vondra
xisting tooling? I mean, there's pretty much just one thing the user can do to make it work, and that's disabling checksums. Sure, they might also enable checksums on the old cluster, but that makes the upgrade much longer, and presumably they use pg_upgrade to upgrade quickly. That being said, I don't feel very strongly about this, so if the consensus is to just error-out, so be it. regards -- Tomas Vondra

Re: Enable data checksums by default

2025-05-23 Thread Tomas Vondra
Isn't the whole point of that change to keep the current workflow working? Also, I'm not sure if "no feedback about this" is reliable. I have no clue if people did any significant testing. Maybe people did a lot of testing and the current state is fine. But it's more likely there was little testing, in which case "no feedback" says nothing. FWIW I would be +0.5 to just let pg_upgrade disable checksums. regards -- Tomas Vondra

Re: generic plans and "initial" pruning

2025-05-22 Thread Tomas Vondra
OK with that in principle, assuming the benefits outweigh the risk of making backpatching harder. The patches don't seem exceptionally large / invasive, but I don't know how often we modify these parts. regards -- Tomas Vondra

Re: plan shape work

2025-05-20 Thread Tomas Vondra
uot;why was the index not used", and the possible answers include "dominated by cost by another path" or "does not match the index keys" etc. I wonder if this work might be useful for something like that. regards -- Tomas Vondra

Re: Please update the pgconf.dev Unconference notes

2025-05-20 Thread Tomas Vondra
ved too quickly in different directions for me to catch all the details, so the notes have gaps etc. If others can improve that / clarify, that'd be great. regards -- Tomas Vondra

Re: generic plans and "initial" pruning

2025-05-20 Thread Tomas Vondra
ts > seem to be reality. The second attached file is a test case that > triggers > > ... FYI I added this as a PG18 open item: https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items regards -- Tomas Vondra

Re: wrong query results on bf leafhopper

2025-05-20 Thread Tomas Vondra
good to kick this one out the pool if there's hardware issues. > There are tools like "stress" and "stressant", etc. Works on my rpi5, but depends on the packager. I'd probably just look at dmesg first. In my experience hardware issues are often pretty visible there - reports of failed I/O requests, thermal issues on the CPU, that kind of stuff. regards -- Tomas Vondra

Re: strange perf regression with data checksums

2025-05-19 Thread Tomas Vondra
On 5/19/25 22:29, Peter Geoghegan wrote: > On Mon, May 19, 2025 at 4:17 PM Tomas Vondra wrote: >> Same effect as v1 for IOS, with regular index scans I see this: >> >> 64 clients: 0.7M tps >> 96 clients: 1.5M tps >> >> So very similar improvement as for IO

Re: strange perf regression with data checksums

2025-05-19 Thread Tomas Vondra
On 5/19/25 20:44, Peter Geoghegan wrote: > On Mon, May 19, 2025 at 2:19 PM Peter Geoghegan wrote: >> On Mon, May 19, 2025 at 2:01 PM Tomas Vondra wrote: >>> The regular index scan however still have this issue, although it's not >>> as visible as for IOS. >

Re: strange perf regression with data checksums

2025-05-19 Thread Tomas Vondra
mentioned maybe we could add an atomic variable tracking the page LSN, so that we don't have to obtain the header lock. I didn't have time to try yet. regards -- Tomas Vondra

Re: strange perf regression with data checksums

2025-05-19 Thread Tomas Vondra
dr, buf_state); AFAICS the lock is needed simply to read a consistent value from the page header, but maybe we could have an atomic variable with a copy of the LSN in the buffer descriptor? regards -- Tomas Vondra | --91.21%--btgettuple

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-11 Thread Tomas Vondra
On 5/11/25 18:07, Peter Geoghegan wrote: > On Sat, May 10, 2025 at 10:59 AM Tomas Vondra wrote: >> But doesn't it also highlight how fragile this memory allocation is? The >> skip scan patch didn't do anything wrong - it just added a couple >> fields, using a lit

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-10 Thread Tomas Vondra
ibc libraries). Still, it's a long-standing behavior, and I doubt it's likely to change. But considering glibc is what most systems use, maybe we should add some protections? I recall there were proposals to add optional mallopt() call to set the M_TOP_PAD when running on glibc. Maybe we should revive that. I also had a patch to add a "memory pool", which fixed this as a side effect. regards -- Tomas Vondra results.pdf Description: Adobe PDF document

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-09 Thread Tomas Vondra
end_memory_contexts after preparing and executing the sample > query, or through pg_get_process_memory_contexts() from another > backend? > I haven't noticed any elevated memory usage in top, but the queries are very short, so I'm not sure how reliable that is. But if adding 4MB is enough to make this go away, I doubt I'd notice a difference. regards -- Tomas Vondra

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-09 Thread Tomas Vondra
On 5/9/25 18:36, Peter Geoghegan wrote: > On Fri, May 9, 2025 at 12:28 PM Tomas Vondra wrote: >> Not sure if it matters, but this uses index-only scans, and the pages >> are all-visible, so maybe it's not much more expensive. > > You're still going to have to s

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-09 Thread Tomas Vondra
tine to nbtree was. It does not remove skip scan itself (that > should still work with queries that are actually eligible to use skip > scan, albeit slightly less efficiently with some opclasses). > Tried, doesn't seem to affect the results at all. -- Tomas Vondra

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-09 Thread Tomas Vondra
On 5/9/25 17:55, Peter Geoghegan wrote: > On Fri, May 9, 2025 at 10:57 AM Tomas Vondra wrote: >> I see the regression even with variants that actually match some rows. >> For example if I do this: > >> so that the query matches 100 rows, I get the same behavior. > &

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-09 Thread Tomas Vondra
n with variants that actually match some rows. For example if I do this: update pgbench_accounts set bid = aid; vacuum full; and change the query to search for "bid = 1", I get exactly the same behavior. Even with update pgbench_accounts set bid = aid / 100; vacuum full; so that the query matches 100 rows, I get the same behavior. -- Tomas Vondra

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-09 Thread Tomas Vondra
On 5/9/25 16:17, Peter Geoghegan wrote: > On Fri, May 9, 2025 at 8:58 AM Tomas Vondra wrote: >> I'm also not sure about the root cause, but while investigating it one >> of the experiments I tried was tweaking the glibc malloc by setting >> >> export

Re: Amcheck verification of GiST and GIN

2025-05-09 Thread Tomas Vondra
* There was a discrepancy between parent and child > * tuples. We need to verify it is not a result of > * concurrent call of gistplacetopage(). So, lock parent > * and try to find downlink for current page. It may be > * missing due to concurrent page split, this is OK. > */ > pfree(stack->parenttup); > stack->parenttup = gin_refind_parent(rel, stack->parentblk, > stack->blkno, strategy); > > I think we can remove gin_refind_parent() and do ereport right away here. > The same logic as with 3). AFAIK it's impossible to have a child item > with a key that is higher than the cached parent key. > Parent key bounds what keys we can insert into the child page, so it > seems there is no way how they can appear there. > These look like good points. I've added it to open items so that we don't forget about this, I won't have time to look at this until after pgconf.dev. thanks -- Tomas Vondra

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-09 Thread Tomas Vondra
116037110 7193 - prepared1 33646 3655 4 25379 1137511342 32 37319 1409713911 There's almost no difference between bc35adee8d7 and 92fe23d93aa. regards -- Tomas Vondra

Re: strange perf regression with data checksums

2025-05-09 Thread Tomas Vondra
trick. > Good question. I haven't checked that explicitly, but it's a tiny data set (15MB) and I observed this even on long benchmarks with tens of millions of queries. So the hint bits should have been set. Also, I should have mentioned the query does an index-only scan, and the pin/unpin calls are on index pages, not on the heap. regards -- Tomas Vondra

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-09 Thread Tomas Vondra
C_TOP_PAD_ would not help like this. But I haven't looked at the code, and I wouldn't have guessed the query to have anything to do with skip scan ... regards -- Tomas Vondra

strange perf regression with data checksums

2025-05-09 Thread Tomas Vondra
e expensive under concurrency (the clients simply have to compete when updating the same counter, and with enough clients there'll be more conflicts and retries). Kinda unfortunate, and maybe we should do something about it, not sure. But why would it depend on checksums at all? This read-only test should be entirely in-memory, so how come it's affected? regards -- Tomas Vondra

Re: Improve hash join's handling of tuples with null join keys

2025-05-05 Thread Tomas Vondra
gconf.dev. I'd be surprised if this was a regression, the hash table lookups are not exactly free. And even if it was a minor regression, it'd affect only cases with many NULL keys, but it improves robustness. BTW do you consider this to be a bugfix for PG18? Or would it have to wait for PG19 at this point? regards -- Tomas Vondra

Re: Parallel CREATE INDEX for GIN indexes

2025-05-02 Thread Tomas Vondra
On 4/30/25 14:39, Tomas Vondra wrote: > > On 4/18/25 03:03, Vinod Sridharan wrote: >> ... >> > > The patch seems fine to me - I repeated the tests with mailing list > archives, with MemoryContextStats() in _gin_parallel_merge, and it > reliably minimizes the memory

Re: Parallel CREATE INDEX for GIN indexes

2025-04-30 Thread Tomas Vondra
fine. I was also worried if this might have performance impact, but it actually seems to make it a little bit faster. I'll get this pushed. thanks -- Tomas Vondra

Re: pgsql: Add function to get memory context stats for processes

2025-04-26 Thread Tomas Vondra
ssGetMemoryContextInterrupt() do the same thing? In any case, if DSA happens to not be the right way to transfer this, what should we use instead? The only thing I can think of is some sort of pre-allocated chunk of shared memory. regards -- Tomas Vondra

Re: Get rid of integer divide in FAST_PATH_REL_GROUP() macro

2025-04-26 Thread Tomas Vondra
o be verifying something that the loop > condition was checking already. I thought it was better to check that > we end up with a power-of-two. > > Please see the attached patch. > Thanks. Those changes seem fine to me to. Do you intend to push these, or do you want me to do it? regards -- Tomas Vondra

Re: AIO v2.5

2025-04-22 Thread Tomas Vondra
cause of the RMT, but I'm also willing to do some of the tests, if needed - but it'd be good to get some guidance. regards -- Tomas Vondra

Re: Enable data checksums by default

2025-04-22 Thread Tomas Vondra
ecksums by default, but now I realize the thread talks about "upgrade experience" which seems fairly wide. So, what kind of data we expect to gather in order to evaluate this? Who's expected to collect it and evaluate this? regards -- Tomas Vondra

Re: index prefetching

2025-04-22 Thread Tomas Vondra
On 4/22/25 18:26, Peter Geoghegan wrote: > On Tue, Apr 22, 2025 at 6:46 AM Tomas Vondra wrote: >> here's an improved (rebased + updated) version of the patch series, with >> some significant fixes and changes. The patch adds infrastructure and >> modifies btree index

Re: Parallel CREATE INDEX for GIN indexes

2025-04-21 Thread Tomas Vondra
approaches > to > resolve this too). > Thanks for the report. I didn't have time to look at this in detail yet, but the fix looks roughly correct. I've added this to the list of open items for PG18. regards -- Tomas Vondra

Re: Draft for basic NUMA observability

2025-04-10 Thread Tomas Vondra
bigint, perhaps? Attached is v28, with the commit messages updated, added about allocation of the memory, etc. I'll let the CI run the tests on it, and then will push, unless someone has more comments. regards -- Tomas Vondra From 9a222c77de2ee4a0b32d97c3d8bab2bb33f066de Mon Sep 17 00:0

Re: Add os_page_num to pg_buffercache

2025-04-10 Thread Tomas Vondra
> - It's currently doing the changes in pg_buffercache v1.6 but will need to > create v1.7 for 19 (if the above stands true) > This seems like a good idea in principle, but at this point it has to wait for PG19. Please add it to the July commitfest. regards -- Tomas Vondra

Re: long-standing data loss bug in initial sync of logical replication

2025-04-10 Thread Tomas Vondra
gt; >> >> Seeing no responses for a long time, I am planning to push the fix >> till 14 tomorrow unless there are some opinions on the fix for 13. We >> can continue to discuss the scope of the fix for 13. >> > > Pushed till 14. > Thanks everyone who persevered and kept working on fixing this! Highly appreciated. regards -- Tomas Vondra

Re: Draft for basic NUMA observability

2025-04-09 Thread Tomas Vondra
On 4/9/25 17:51, Andres Freund wrote: > Hi, > > On 2025-04-09 17:28:31 +0200, Tomas Vondra wrote: >> On 4/9/25 17:14, Andres Freund wrote: >>> I'd mention that the includes of postgres.h/fmgr.h is what caused missing >>> build-time dependencies and via tha

Re: Draft for basic NUMA observability

2025-04-09 Thread Tomas Vondra
On 4/9/25 17:14, Andres Freund wrote: > Hi, > > On 2025-04-09 16:33:14 +0200, Tomas Vondra wrote: >> From e1f093d091610d70fba72b2848f25ff44899ea8e Mon Sep 17 00:00:00 2001 >> From: Tomas Vondra >> Date: Tue, 8 Apr 2025 23:31:29 +0200 >> Subject: [PATCH 1/2] Clea

Re: Draft for basic NUMA observability

2025-04-09 Thread Tomas Vondra
On 4/9/25 01:29, Andres Freund wrote: > Hi, > > On 2025-04-09 01:10:09 +0200, Tomas Vondra wrote: >> On 4/8/25 15:06, Andres Freund wrote: >>> Hi, >>> >>> On 2025-04-08 17:44:19 +0500, Kirill Reshke wrote: >>>> On Mon, 7 Apr 2025 at 23:00, To

Re: Draft for basic NUMA observability

2025-04-09 Thread Tomas Vondra
Updated patches with proper commit messages etc. -- Tomas Vondra From e1f093d091610d70fba72b2848f25ff44899ea8e Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Tue, 8 Apr 2025 23:31:29 +0200 Subject: [PATCH 1/2] Cleanup of pg_numa.c This moves/renames some of the functions defined in

Re: Draft for basic NUMA observability

2025-04-09 Thread Tomas Vondra
On 4/9/25 14:07, Tomas Vondra wrote: > ... > > OK, here are two patches, where 0001 adds the missingdeps check to the > Debian meson build. It just adds that to the build script. > > 0002 leaves the NUMA stuff in src/port (i.e. it's no longer moved to > src/backen

Re: Draft for basic NUMA observability

2025-04-08 Thread Tomas Vondra
On 4/8/25 15:06, Andres Freund wrote: > Hi, > > On 2025-04-08 17:44:19 +0500, Kirill Reshke wrote: >> On Mon, 7 Apr 2025 at 23:00, Tomas Vondra wrote: >>> I'll let the CI run the tests on it, and >>> then will push, unless someone has more comments. >

Re: Draft for basic NUMA observability

2025-04-08 Thread Tomas Vondra
On 4/8/25 15:06, Andres Freund wrote: > Hi, > > On 2025-04-08 17:44:19 +0500, Kirill Reshke wrote: >> On Mon, 7 Apr 2025 at 23:00, Tomas Vondra wrote: >>> I'll let the CI run the tests on it, and >>> then will push, unless someone has more comments. >

Re: Draft for basic NUMA observability

2025-04-08 Thread Tomas Vondra
On 4/8/25 16:59, Andres Freund wrote: > Hi, > > On 2025-04-08 09:35:37 -0400, Andres Freund wrote: >> On April 8, 2025 9:21:57 AM EDT, Tomas Vondra wrote: >>> On 4/8/25 15:06, Andres Freund wrote: >>>> On 2025-04-08 17:44:19 +0500, Kirill Reshke wro

Re: Draft for basic NUMA observability

2025-04-08 Thread Tomas Vondra
> The attached small patch fixes the manual. > Thank you for noticing this and for the fix! Pushed. This also reminded me we agreed to change page_num to bigint, which I forgot to change before commit. So I adjusted that too, separately. regards -- Tomas Vondra

Re: Draft for basic NUMA observability

2025-04-07 Thread Tomas Vondra
On 4/7/25 17:51, Andres Freund wrote: > Hi, > > On 2025-04-06 13:56:54 +0200, Tomas Vondra wrote: >> On 4/6/25 01:00, Andres Freund wrote: >>> On 2025-04-05 18:29:22 -0400, Andres Freund wrote: >>>> I think one thing that the docs should mention is that callin

Re: Draft for basic NUMA observability

2025-04-07 Thread Tomas Vondra
On 4/7/25 23:50, Jakub Wartak wrote: > On Mon, Apr 7, 2025 at 11:27 PM Tomas Vondra wrote: >> >> Hi, >> >> I've pushed all three parts of v29, with some additional corrections >> (picked lower OIDs, bumped catversion, fixed commit messages). > > H

Re: Draft for basic NUMA observability

2025-04-07 Thread Tomas Vondra
Hi, I've pushed all three parts of v29, with some additional corrections (picked lower OIDs, bumped catversion, fixed commit messages). On 4/7/25 23:01, Jakub Wartak wrote: > On Mon, Apr 7, 2025 at 9:51 PM Tomas Vondra wrote: > >>> So it looks like that the new way to it

Re: Draft for basic NUMA observability

2025-04-07 Thread Tomas Vondra
On 4/7/25 20:11, Bertrand Drouvot wrote: > Hi, > > On Mon, Apr 07, 2025 at 12:42:21PM -0400, Andres Freund wrote: >> Hi, >> >> On 2025-04-07 18:36:24 +0200, Tomas Vondra wrote: >> >> I was thinking of checking if the BufferDesc indicates BM_VALID or >&g

Re: Draft for basic NUMA observability

2025-04-07 Thread Tomas Vondra
ent patches are good enough >> for PG18, with the current behavior, and then maybe improve that in >> PG19. > > I think as long as the docs mention this with or it's ok for > now. > OK, I'll add a warning explaining this. regards -- Tomas Vondra

Re: Improve monitoring of shared memory allocations

2025-04-07 Thread Tomas Vondra
ssion tests can't tell us much, considering it didn't fail once with the reverted patch :-( I did check the coverage in: https://coverage.postgresql.org/src/backend/utils/hash/dynahash.c.gcov.html and sure enough, dir_realloc() is not executed once. And there's a couple more p

Re: Draft for basic NUMA observability

2025-04-07 Thread Tomas Vondra
in os_page_status. I intend to push 0001 and 0002 shortly, and 0003 after a bit more review and testing, unless I hear objections. regards -- Tomas Vondra From fcc4fc2ada33cbbc962d561ddeea6966f0d55492 Mon Sep 17 00:00:00 2001 From: Jakub Wartak Date: Wed, 2 Apr 2025 12:29:22 +0200 Subject: [P

Re: Draft for basic NUMA observability

2025-04-06 Thread Tomas Vondra
;> pages. >>> + * It's a bit misleading to call that "aligned", no? */ >>> + >>> + /* Get number of OS aligned pages */ >>> + shm_ent_page_count >>> + = TYPEALIGN(os_page_size, ent->allocated_size) / >>> os_page_size; >>> + >>> + /* >>> + * If we get ever 0xff back from kernel inquiry, then we >>> probably have >>> + * bug in our buffers to OS page mapping code here. >>> + */ >>> + memset(pages_status, 0xff, sizeof(int) * shm_ent_page_count); >> >> There's obviously no guarantee that shm_ent_page_count is a multiple of >> os_page_size. I think it'd be interesting to show in the view when one shmem >> allocation shares a page with the prior allocation - that can contribute a >> bit >> to contention. What about showing a start_os_page_id and end_os_page_id or >> something? That could be a feature for later though. > > I was thinking about it, but it could be done when analyzing this > together with data from pg_shmem_allocations(?) My worry is timing :( > Anyway, we could extend this view in future revisions. > I'd leave this out for now. It's not difficult, but let's focus on the other issues. >>> +SELECT NOT(pg_numa_available()) AS skip_test \gset >>> +\if :skip_test >>> +\quit >>> +\endif >>> +-- switch to superuser >>> +\c - >>> +SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa; >>> + ok >>> + >>> + t >>> +(1 row) >> >> Could it be worthwhile to run the test if !pg_numa_available(), to test that >> we do the right thing in that case? We need an alternative output anyway, so >> that might be fine? > > Added. the meson test passes, but I'm sending it as fast as possible > to avoid a clash with Tomas. > Please keep working on this. I may hava a bit of time in the evening, but in the worst case I'll merge it into your patch. regards -- Tomas Vondra

Re: Draft for basic NUMA observability

2025-04-06 Thread Tomas Vondra
he current backend, so I'd bet people would not be happy with NULL, and would proceed to force the allocation in some other way (say, a large query of some sort). Which obviously causes a lot of other problems. I can imagine having a flag that makes the allocation optional, but there's no convenient way to pass that to a view, and I think most people want the allocation anyway. Especially for monitoring purposes, which usually happens in a new connection, so the backend has little opportunity to allocate the pages "naturally." regards -- Tomas Vondra

Re: Draft for basic NUMA observability

2025-04-06 Thread Tomas Vondra
at right now, but at the very least we ought to > document it. > +1 to documenting this > > On 2025-04-05 16:33:28 +0200, Tomas Vondra wrote: >> The libnuma library is not available on 32-bit builds (there's no shared >> object for i386), so we disable it in that

Re: Improve monitoring of shared memory allocations

2025-04-05 Thread Tomas Vondra
fields. Seems a bit weird, but we always did that - the patch does not really change that. I'll now mark this as committed. I haven't done about the alignment. My conclusion from the discussion was we don't quite need to do that, but if we do I think it's a matter for a separate patch - perhaps something like the 0003. Thanks for the patch, reviews, etc. -- Tomas Vondra

Re: Snapshot related assert failure on skink

2025-04-05 Thread Tomas Vondra
On 3/24/25 16:25, Heikki Linnakangas wrote: > On 24/03/2025 16:56, Tomas Vondra wrote: >> >> >> On 3/23/25 17:43, Heikki Linnakangas wrote: >>> On 21/03/2025 17:16, Andres Freund wrote: >>>> Am I right in understanding that the only scenario (w

Re: Draft for basic NUMA observability

2025-04-05 Thread Tomas Vondra
On 4/5/25 15:23, Tomas Vondra wrote: > On 4/5/25 11:37, Bertrand Drouvot wrote: >> Hi, >> >> On Fri, Apr 04, 2025 at 09:25:57PM +0200, Tomas Vondra wrote: >>> OK, >>> >>> here's v25 after going through the patches once more, fixing the issues &

Re: Draft for basic NUMA observability

2025-04-05 Thread Tomas Vondra
On 4/5/25 11:37, Bertrand Drouvot wrote: > Hi, > > On Fri, Apr 04, 2025 at 09:25:57PM +0200, Tomas Vondra wrote: >> OK, >> >> here's v25 after going through the patches once more, fixing the issues >> mentioned by Bertrand, etc. > > Thanks! > &

Re: Proposal: Adding compression of temporary files

2025-04-04 Thread Tomas Vondra
code gets multiple loops in while (wpos < file->nbytes) { ... } because bytestowrite will be the value from the last loop? I haven't tried, but I guess writing wide tuples (more than 8k) might fail. regards -- Tomas Vondra

Re: Draft for basic NUMA observability

2025-04-04 Thread Tomas Vondra
in the function comment, but I'm also not quite sure I understand what "output shared memory" is ... regards -- Tomas Vondra From 381c5077592e38dbcbbf6acc4f1e86a767a92957 Mon Sep 17 00:00:00 2001 From: Jakub Wartak Date: Wed, 2 Apr 2025 12:29:22 +0200 Subject: [PATCH v25 1/5]

Re: index prefetching

2025-04-04 Thread Tomas Vondra
Yes, I agree. regards -- Tomas Vondra

Re: Draft for basic NUMA observability

2025-04-04 Thread Tomas Vondra
On 4/4/25 08:50, Bertrand Drouvot wrote: > Hi, > > On Thu, Apr 03, 2025 at 08:53:57PM +0200, Tomas Vondra wrote: >> On 4/3/25 15:12, Jakub Wartak wrote: >>> On Thu, Apr 3, 2025 at 1:52 PM Tomas Vondra wrote: >>> >>>> ... >>>> >>&

Re: Draft for basic NUMA observability

2025-04-04 Thread Tomas Vondra
On 4/4/25 09:35, Jakub Wartak wrote: > On Fri, Apr 4, 2025 at 8:50 AM Bertrand Drouvot > wrote: >> >> Hi, >> >> On Thu, Apr 03, 2025 at 08:53:57PM +0200, Tomas Vondra wrote: >>> On 4/3/25 15:12, Jakub Wartak wrote: >>>>

Re: Draft for basic NUMA observability

2025-04-03 Thread Tomas Vondra
On 4/3/25 15:12, Jakub Wartak wrote: > On Thu, Apr 3, 2025 at 1:52 PM Tomas Vondra wrote: > >> ... >> >> So unless someone can demonstrate a use case where this would matter, >> I'd not worry about it too much. > > OK, fine for me - just 3 cols for p

Re: Draft for basic NUMA observability

2025-04-03 Thread Tomas Vondra
On 4/3/25 10:23, Bertrand Drouvot wrote: > Hi, > > On Thu, Apr 03, 2025 at 09:01:43AM +0200, Jakub Wartak wrote: >> On Wed, Apr 2, 2025 at 6:40 PM Tomas Vondra wrote: >> >> Hi Tomas, >> >>> OK, so you agree the commit messages are complete / correct

Re: Draft for basic NUMA observability

2025-04-03 Thread Tomas Vondra
On 4/3/25 09:01, Jakub Wartak wrote: > On Wed, Apr 2, 2025 at 6:40 PM Tomas Vondra wrote: > > Hi Tomas, > >> OK, so you agree the commit messages are complete / correct? > > Yes. > >> OK. FWIW if you disagree with some of my proposed changes, feel free to &

Re: BTScanOpaqueData size slows down tests

2025-04-02 Thread Tomas Vondra
On 4/2/25 17:45, Peter Geoghegan wrote: > On Wed, Apr 2, 2025 at 11:36 AM Tom Lane wrote: >> Ouch! I had no idea it had gotten that big. Yeah, we ought to >> do something about that. > > Tomas Vondra talked about this recently, in the context of his work on > prefe

Re: Parallel CREATE INDEX for GIN indexes

2025-04-02 Thread Tomas Vondra
On 4/2/25 18:43, Andres Freund wrote: > Hi, > > On 2025-03-04 20:50:43 +0100, Tomas Vondra wrote: >> I pushed the two smaller parts today. >> >> Here's the remaining two parts, to keep cfbot happy. I don't expect to >> get these into PG18, though. >

Re: Draft for basic NUMA observability

2025-04-02 Thread Tomas Vondra
On 4/2/25 16:46, Jakub Wartak wrote: > On Tue, Apr 1, 2025 at 10:17 PM Tomas Vondra wrote: >> >> Hi, >> >> I've spent a bit of time reviewing this. In general I haven't found >> anything I'd call a bug, but here's a couple comments for v1

Re: Draft for basic NUMA observability

2025-04-01 Thread Tomas Vondra
inters like this etc.). 11) This could use UINT64_FORMAT, instead of a cast: elog(DEBUG1, "NUMA: os_page_count=%lu os_page_size=%zu pages_per_blk=%.2f", (unsigned long) os_page_count, os_page_size, pages_per_blk); regards -- Tomas Vondra From 46a7801b1985a81bb8bc35fcfb2cbb74e6ea5

Re: Improve monitoring of shared memory allocations

2025-03-31 Thread Tomas Vondra
t;number of elements", but it's a simple flag. So I renamed it to "prealloc", which seems clearer to me. I also tweaked (reordered/reformatted) the conditions a bit. For the other patch, I realized we can simply MemSet() the whole chunk, instead of resetting the individual parts

Re: Amcheck verification of GiST and GIN

2025-03-30 Thread Tomas Vondra
On 3/30/25 06:04, Tom Lane wrote: > Tomas Vondra writes: >> I've pushed all the parts of this patch series, except for the stress >> test - which I think was not meant for commit. >> buildfarm seems happy so far, except for a minor indentation issue >> (forg

Re: Amcheck verification of GiST and GIN

2025-03-29 Thread Tomas Vondra
On 3/28/25 20:51, Kirill Reshke wrote: > On Fri, 28 Mar 2025 at 21:26, Tomas Vondra wrote: >> >> Here's a polished version of the patches. If you have any >> comments/objections, please speak now. >> -- >> Tomas Vondra > > Hi, no objections, lgtm

Re: Amcheck verification of GiST and GIN

2025-03-28 Thread Tomas Vondra
callee functions. So now it's - amcheck consistency check context - posting tree check context regards -- Tomas Vondra From 28b392b687f641b09bc79bb3bb3e61505845e6c1 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Fri, 28 Mar 2025 16:49:04 +0100 Subject: [PATCH v20250328 1/6] Fix grammar in

Re: Improve monitoring of shared memory allocations

2025-03-28 Thread Tomas Vondra
> */ > > hash_create code is confusing because the nelem_alloc named variable is used > in two different cases, In  the above case  nelem_alloc  refers to the one  > returned by choose_nelem_alloc function. > > The other nelem_alloc determines the number of elements in each partition > for a partitioned hash table. This is not what is being referred to in > the above  > comment. > > The bit "For more explanation see comments within this function" is not > great, if only because there are not many comments within the function, > so there's no "more explanation". But if there's something important, it > should be in the main comment, preferably. > >   > I will improve the comment in the next version. > OK. Do we even need to pass nelem_alloc to hash_get_init_size? It's not really used except for this bit: +if (init_size > nelem_alloc) +element_alloc = false; Can't we determine before calling the function, to make it a bit less confusing? regards -- Tomas Vondra

Re: Improve monitoring of shared memory allocations

2025-03-27 Thread Tomas Vondra
On 3/27/25 13:56, Tomas Vondra wrote: > ... > > OK, I don't have any other comments for 0001 and 0002. I'll do some > more review and polishing on those, and will get them committed soon. > Actually ... while polishing 0001 and 0002, I noticed a couple more details t

Re: Amcheck verification of GiST and GIN

2025-03-27 Thread Tomas Vondra
On 3/27/25 16:30, Mark Dilger wrote: > > > On Fri, Feb 21, 2025 at 6:29 AM Tomas Vondra <mailto:to...@vondra.me>> wrote: > > Hi, > > I see this patch didn't move since December :-( I still think these > improvements would be useful, it

Re: Improve monitoring of shared memory allocations

2025-03-27 Thread Tomas Vondra
his. >  Do you have any suggestions in mind?   > > Please find attached updated patches after merging all your review > comments except > a few discussed above. >   OK, I don't have any other comments for 0001 and 0002. I'll do some more review and polishing on those, and will get them committed soon. I don't plan to push 0003, unless someone can actually explain and demonstrate the benefits of the proposed padding, regards -- Tomas Vondra

Re: Advanced Patch Feedback Session / pgconf.dev 2025

2025-03-25 Thread Tomas Vondra
ards Tomas On 3/13/25 17:37, Tomas Vondra wrote: > Hi all, > > pgconf.dev 2025 will host "Advanced Patch Feedback Session", with the > same format as in 2024 [1]: > > Participants will work in small groups with a Postgres committer to > analyze a past contribut

Re: Snapshot related assert failure on skink

2025-03-24 Thread Tomas Vondra
x27;s be tidy and fix both latestCompletedXid and > xactCompletionCount. > Thanks for looking into this and pushing the fix. Would it make sense to add a comment documenting this reasoning about not handling aborts? Otherwise someone will get to rediscover this in the future ... regards -- Tomas Vondra

Re: Improve monitoring of shared memory allocations

2025-03-24 Thread Tomas Vondra
ntion can we get there? I don't get it. Also, why is the patch adding padding after statusFlags (the last array allocated in InitProcGlobal) and not between allProcs and xids? regards -- Tomas Vondra From f527909dda02b4c7231db53a0fe6cecbaec55ca4 Mon Sep 17 00:00:00 2001 From: Rahila Sye

Re: Snapshot related assert failure on skink

2025-03-21 Thread Tomas Vondra
On 3/19/25 13:27, Tomas Vondra wrote: > On 3/19/25 08:17, Heikki Linnakangas wrote: >> On 19/03/2025 04:22, Tomas Vondra wrote: >>> I kept stress-testing this, and while the frequency massively increased >>> on PG18, I managed to reproduce this all the way back to

Re: Snapshot related assert failure on skink

2025-03-19 Thread Tomas Vondra
On 3/19/25 08:17, Heikki Linnakangas wrote: > On 19/03/2025 04:22, Tomas Vondra wrote: >> I kept stress-testing this, and while the frequency massively increased >> on PG18, I managed to reproduce this all the way back to PG14. I see >> ~100x more corefiles on PG18. >>

Re: Snapshot related assert failure on skink

2025-03-18 Thread Tomas Vondra
7;t have the same issue. None of them seems to advance the XID to 209508. regards -- Tomas Vondra

Re: Snapshot related assert failure on skink

2025-03-17 Thread Tomas Vondra
On 3/17/25 13:18, Thomas Munro wrote: > On Tue, Mar 18, 2025 at 12:59 AM Tomas Vondra wrote: >> On 3/17/25 12:36, Tomas Vondra wrote: >>> I'm still fiddling with the script, trying to increase the probability >>> of the (apparent) race condition. On one machine

Re: Snapshot related assert failure on skink

2025-03-17 Thread Tomas Vondra
On 3/17/25 12:36, Tomas Vondra wrote: > ... > > I'm still fiddling with the script, trying to increase the probability > of the (apparent) race condition. On one machine (old Xeon) I can hit it > very easily/reliably, while on a different machine (new Ryzen) it's ver

Re: Snapshot related assert failure on skink

2025-03-17 Thread Tomas Vondra
g to increase the probability of the (apparent) race condition. On one machine (old Xeon) I can hit it very easily/reliably, while on a different machine (new Ryzen) it's very rare. I don't know if that's due to difference in speed of the CPU, or fewer cores, ... I guess it changes the timing just enough. I've also tried running the stress test on PG17, and I'm yet to see a single failure there. Not even on the xeon machine, that hits it reliably on 18. So this seems to be a PG18-only issue. If needed, I can try adding more logging, or test a patch. regards -- Tomas Vondra

Assert(TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin));

2025-03-15 Thread Tomas Vondra
a while to hit it - on my laptop it takes an hour or so, but I guess it's more about the random sleeps in the script. I've only ever seen this on the standby, never on the primary. regards -- Tomas Vondra Program terminated with signal SIGABRT, Aborted. #0 __pthread_kill_implementa

  1   2   3   4   5   6   7   8   9   10   >