On 6/9/25 00:14, Tomas Vondra wrote:
> ...
>
> I propose to split it like this, into three parts, each addressing a
> particular type of mistake:
>
> 1) gin_check_posting_tree_parent_keys_consistency
>
> 2) gin_check_parent_keys_consistency
hink these tests are
> portable. While writing tests some minor issues were found and fixed.
> Also ci compiler warnings were fixed.
>
Thanks. I've added myself as a reviewer, so that I don't forget about
this for the next CF.
regards
--
Tomas Vondra
On 5/29/25 13:53, Arseniy Mukhin wrote:
> On Mon, May 26, 2025 at 7:28 PM Arseniy Mukhin
> wrote:
>> On Mon, May 26, 2025 at 1:27 PM Tomas Vondra wrote:
>>> Also, I've noticed that the TAP test passes even with some (most) of the
>>> verify_gin.c changes rever
On 6/4/25 19:59, Jim Nasby wrote:
>
>
> On Fri, May 23, 2025 at 4:29 PM Tomas Vondra <mailto:to...@vondra.me>> wrote:
>
> Also, Alvaro seemed to think TAM is the way to go, and in order to keep
> the OLTP performance he suggested to use both heap and VCI
son, and
treated it as "normal". But with the default changes, it'll be easier to
spot once they upgrade to PG18.
So better to get this in now, otherwise we may have to wait until PG19,
because of ABI (the patch adds a field into BTScanPosData, but maybe
it'd be possible to add it into padding, not sure).
regards
--
Tomas Vondra
ow which ones to set, a lot
of the knowledge is somewhat outdated I think.
Wouldn't it be better for btrfs to just start returning EOPNOTSUPP
(maybe with a mount option), in which case we already do the right thing
automatically already? Sure, it means the admin needs to be aware of
this in both cases.
regards
--
Tomas Vondra
efully will
> not affect postgres (see CAVEATS in man 3 posix_fallocate).
>
Well, if btrfs starts returning EOPNOTSUPP, and glibc switches to the
userspace fallback, we wouldn't notice. But that's up to the btrfs to
decide if they want to support fallocate. We still need our fallback
anyway, because of other OSes.
regards
--
Tomas Vondra
u run these tests in parallel. Can you share the
patch/script?
thank
--
Tomas Vondra
e the TAP test to trigger this
too? To show the current code (in master) misses this?
Grigory, Andrey, Heikki, any opinions on the tweaks?
regards
--
Tomas Vondra
From 973de3eaeeca7ff2946a5b0f92f481d70ba5b78d Mon Sep 17 00:00:00 2001
From: Tomas Vondra
Date: Mon, 26 May 2025 12:10:37 +0200
Subje
that break the seqscan?
FWIW I think with the use case from the beginning of this thread:
1. Add/update/remove entries in hash table
2. Scan the existing entries and perform one transaction per entry
3. Close scan
Why not to simply build a linked list after step (1)?
regards
--
Tomas Vondra
r.c:115:28: error: assignment to
‘ExecutorStart_hook_type’ {aka ‘void (*)(QueryDesc *, int)’} from
incompatible pointer type ‘_Bool (*)(QueryDesc *, int)’
[-Wincompatible-pointer-types]
115 | ExecutorStart_hook = vci_executor_start_routine;
|^
executor/vci_executor.c: In function ‘vci_executor_start_routine’:
executor/vci_executor.c:161:28: error: void value not ignored as it
ought to be
161 | plan_valid = executor_start_prev(queryDesc, eflags);
|^
executor/vci_executor.c:163:28: error: void value not ignored as it
ought to be
163 | plan_valid = standard_ExecutorStart(queryDesc,
eflags);
|^
make: *** [../../src/Makefile.global:973: executor/vci_executor.o] Error 1
The extension is not added to contrib/Makefile, so "make world" does not
trigger this failure.
regards
--
Tomas Vondra
xisting tooling? I mean,
there's pretty much just one thing the user can do to make it work, and
that's disabling checksums. Sure, they might also enable checksums on
the old cluster, but that makes the upgrade much longer, and presumably
they use pg_upgrade to upgrade quickly.
That being said, I don't feel very strongly about this, so if the
consensus is to just error-out, so be it.
regards
--
Tomas Vondra
Isn't the whole point of that
change to keep the current workflow working?
Also, I'm not sure if "no feedback about this" is reliable. I have no
clue if people did any significant testing. Maybe people did a lot of
testing and the current state is fine. But it's more likely there was
little testing, in which case "no feedback" says nothing.
FWIW I would be +0.5 to just let pg_upgrade disable checksums.
regards
--
Tomas Vondra
OK with that in principle, assuming the benefits outweigh the risk
of making backpatching harder. The patches don't seem exceptionally
large / invasive, but I don't know how often we modify these parts.
regards
--
Tomas Vondra
uot;why was the index not used", and the
possible answers include "dominated by cost by another path" or "does
not match the index keys" etc.
I wonder if this work might be useful for something like that.
regards
--
Tomas Vondra
ved too quickly in different
directions for me to catch all the details, so the notes have gaps etc.
If others can improve that / clarify, that'd be great.
regards
--
Tomas Vondra
ts
> seem to be reality. The second attached file is a test case that
> triggers
>
> ...
FYI I added this as a PG18 open item:
https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items
regards
--
Tomas Vondra
good to kick this one out the pool if there's hardware issues.
>
There are tools like "stress" and "stressant", etc. Works on my rpi5,
but depends on the packager.
I'd probably just look at dmesg first. In my experience hardware issues
are often pretty visible there - reports of failed I/O requests, thermal
issues on the CPU, that kind of stuff.
regards
--
Tomas Vondra
On 5/19/25 22:29, Peter Geoghegan wrote:
> On Mon, May 19, 2025 at 4:17 PM Tomas Vondra wrote:
>> Same effect as v1 for IOS, with regular index scans I see this:
>>
>> 64 clients: 0.7M tps
>> 96 clients: 1.5M tps
>>
>> So very similar improvement as for IO
On 5/19/25 20:44, Peter Geoghegan wrote:
> On Mon, May 19, 2025 at 2:19 PM Peter Geoghegan wrote:
>> On Mon, May 19, 2025 at 2:01 PM Tomas Vondra wrote:
>>> The regular index scan however still have this issue, although it's not
>>> as visible as for IOS.
>
mentioned maybe we could add an atomic variable
tracking the page LSN, so that we don't have to obtain the header lock.
I didn't have time to try yet.
regards
--
Tomas Vondra
dr, buf_state);
AFAICS the lock is needed simply to read a consistent value from the
page header, but maybe we could have an atomic variable with a copy of
the LSN in the buffer descriptor?
regards
--
Tomas Vondra
|
--91.21%--btgettuple
On 5/11/25 18:07, Peter Geoghegan wrote:
> On Sat, May 10, 2025 at 10:59 AM Tomas Vondra wrote:
>> But doesn't it also highlight how fragile this memory allocation is? The
>> skip scan patch didn't do anything wrong - it just added a couple
>> fields, using a lit
ibc libraries). Still,
it's a long-standing behavior, and I doubt it's likely to change. But
considering glibc is what most systems use, maybe we should add some
protections?
I recall there were proposals to add optional mallopt() call to set the
M_TOP_PAD when running on glibc. Maybe we should revive that. I also had
a patch to add a "memory pool", which fixed this as a side effect.
regards
--
Tomas Vondra
results.pdf
Description: Adobe PDF document
end_memory_contexts after preparing and executing the sample
> query, or through pg_get_process_memory_contexts() from another
> backend?
>
I haven't noticed any elevated memory usage in top, but the queries are
very short, so I'm not sure how reliable that is. But if adding 4MB is
enough to make this go away, I doubt I'd notice a difference.
regards
--
Tomas Vondra
On 5/9/25 18:36, Peter Geoghegan wrote:
> On Fri, May 9, 2025 at 12:28 PM Tomas Vondra wrote:
>> Not sure if it matters, but this uses index-only scans, and the pages
>> are all-visible, so maybe it's not much more expensive.
>
> You're still going to have to s
tine to nbtree was. It does not remove skip scan itself (that
> should still work with queries that are actually eligible to use skip
> scan, albeit slightly less efficiently with some opclasses).
>
Tried, doesn't seem to affect the results at all.
--
Tomas Vondra
On 5/9/25 17:55, Peter Geoghegan wrote:
> On Fri, May 9, 2025 at 10:57 AM Tomas Vondra wrote:
>> I see the regression even with variants that actually match some rows.
>> For example if I do this:
>
>> so that the query matches 100 rows, I get the same behavior.
>
&
n with variants that actually match some rows.
For example if I do this:
update pgbench_accounts set bid = aid;
vacuum full;
and change the query to search for "bid = 1", I get exactly the same
behavior. Even with
update pgbench_accounts set bid = aid / 100;
vacuum full;
so that the query matches 100 rows, I get the same behavior.
--
Tomas Vondra
On 5/9/25 16:17, Peter Geoghegan wrote:
> On Fri, May 9, 2025 at 8:58 AM Tomas Vondra wrote:
>> I'm also not sure about the root cause, but while investigating it one
>> of the experiments I tried was tweaking the glibc malloc by setting
>>
>> export
* There was a discrepancy between parent and child
> * tuples. We need to verify it is not a result of
> * concurrent call of gistplacetopage(). So, lock parent
> * and try to find downlink for current page. It may be
> * missing due to concurrent page split, this is OK.
> */
> pfree(stack->parenttup);
> stack->parenttup = gin_refind_parent(rel, stack->parentblk,
> stack->blkno, strategy);
>
> I think we can remove gin_refind_parent() and do ereport right away here.
> The same logic as with 3). AFAIK it's impossible to have a child item
> with a key that is higher than the cached parent key.
> Parent key bounds what keys we can insert into the child page, so it
> seems there is no way how they can appear there.
>
These look like good points. I've added it to open items so that we
don't forget about this, I won't have time to look at this until after
pgconf.dev.
thanks
--
Tomas Vondra
116037110 7193
-
prepared1 33646 3655
4 25379 1137511342
32 37319 1409713911
There's almost no difference between bc35adee8d7 and 92fe23d93aa.
regards
--
Tomas Vondra
trick.
>
Good question. I haven't checked that explicitly, but it's a tiny data
set (15MB) and I observed this even on long benchmarks with tens of
millions of queries. So the hint bits should have been set.
Also, I should have mentioned the query does an index-only scan, and the
pin/unpin calls are on index pages, not on the heap.
regards
--
Tomas Vondra
C_TOP_PAD_ would not help like this. But I haven't
looked at the code, and I wouldn't have guessed the query to have
anything to do with skip scan ...
regards
--
Tomas Vondra
e expensive under concurrency (the clients
simply have to compete when updating the same counter, and with enough
clients there'll be more conflicts and retries). Kinda unfortunate, and
maybe we should do something about it, not sure.
But why would it depend on checksums at all? This read-only test should
be entirely in-memory, so how come it's affected?
regards
--
Tomas Vondra
gconf.dev.
I'd be surprised if this was a regression, the hash table lookups are
not exactly free. And even if it was a minor regression, it'd affect
only cases with many NULL keys, but it improves robustness.
BTW do you consider this to be a bugfix for PG18? Or would it have to
wait for PG19 at this point?
regards
--
Tomas Vondra
On 4/30/25 14:39, Tomas Vondra wrote:
>
> On 4/18/25 03:03, Vinod Sridharan wrote:
>> ...
>>
>
> The patch seems fine to me - I repeated the tests with mailing list
> archives, with MemoryContextStats() in _gin_parallel_merge, and it
> reliably minimizes the memory
fine.
I was also worried if this might have performance impact, but it
actually seems to make it a little bit faster.
I'll get this pushed.
thanks
--
Tomas Vondra
ssGetMemoryContextInterrupt() do the
same thing?
In any case, if DSA happens to not be the right way to transfer this,
what should we use instead? The only thing I can think of is some sort
of pre-allocated chunk of shared memory.
regards
--
Tomas Vondra
o be verifying something that the loop
> condition was checking already. I thought it was better to check that
> we end up with a power-of-two.
>
> Please see the attached patch.
>
Thanks. Those changes seem fine to me to.
Do you intend to push these, or do you want me to do it?
regards
--
Tomas Vondra
cause of the RMT, but I'm also willing to do some of
the tests, if needed - but it'd be good to get some guidance.
regards
--
Tomas Vondra
ecksums by default, but now I realize the thread talks about "upgrade
experience" which seems fairly wide.
So, what kind of data we expect to gather in order to evaluate this?
Who's expected to collect it and evaluate this?
regards
--
Tomas Vondra
On 4/22/25 18:26, Peter Geoghegan wrote:
> On Tue, Apr 22, 2025 at 6:46 AM Tomas Vondra wrote:
>> here's an improved (rebased + updated) version of the patch series, with
>> some significant fixes and changes. The patch adds infrastructure and
>> modifies btree index
approaches
> to
> resolve this too).
>
Thanks for the report. I didn't have time to look at this in detail yet,
but the fix looks roughly correct. I've added this to the list of open
items for PG18.
regards
--
Tomas Vondra
bigint, perhaps?
Attached is v28, with the commit messages updated, added about
allocation of the memory, etc. I'll let the CI run the tests on it, and
then will push, unless someone has more comments.
regards
--
Tomas Vondra
From 9a222c77de2ee4a0b32d97c3d8bab2bb33f066de Mon Sep 17 00:0
> - It's currently doing the changes in pg_buffercache v1.6 but will need to
> create v1.7 for 19 (if the above stands true)
>
This seems like a good idea in principle, but at this point it has to
wait for PG19. Please add it to the July commitfest.
regards
--
Tomas Vondra
gt;
>>
>> Seeing no responses for a long time, I am planning to push the fix
>> till 14 tomorrow unless there are some opinions on the fix for 13. We
>> can continue to discuss the scope of the fix for 13.
>>
>
> Pushed till 14.
>
Thanks everyone who persevered and kept working on fixing this! Highly
appreciated.
regards
--
Tomas Vondra
On 4/9/25 17:51, Andres Freund wrote:
> Hi,
>
> On 2025-04-09 17:28:31 +0200, Tomas Vondra wrote:
>> On 4/9/25 17:14, Andres Freund wrote:
>>> I'd mention that the includes of postgres.h/fmgr.h is what caused missing
>>> build-time dependencies and via tha
On 4/9/25 17:14, Andres Freund wrote:
> Hi,
>
> On 2025-04-09 16:33:14 +0200, Tomas Vondra wrote:
>> From e1f093d091610d70fba72b2848f25ff44899ea8e Mon Sep 17 00:00:00 2001
>> From: Tomas Vondra
>> Date: Tue, 8 Apr 2025 23:31:29 +0200
>> Subject: [PATCH 1/2] Clea
On 4/9/25 01:29, Andres Freund wrote:
> Hi,
>
> On 2025-04-09 01:10:09 +0200, Tomas Vondra wrote:
>> On 4/8/25 15:06, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2025-04-08 17:44:19 +0500, Kirill Reshke wrote:
>>>> On Mon, 7 Apr 2025 at 23:00, To
Updated patches with proper commit messages etc.
--
Tomas Vondra
From e1f093d091610d70fba72b2848f25ff44899ea8e Mon Sep 17 00:00:00 2001
From: Tomas Vondra
Date: Tue, 8 Apr 2025 23:31:29 +0200
Subject: [PATCH 1/2] Cleanup of pg_numa.c
This moves/renames some of the functions defined in
On 4/9/25 14:07, Tomas Vondra wrote:
> ...
>
> OK, here are two patches, where 0001 adds the missingdeps check to the
> Debian meson build. It just adds that to the build script.
>
> 0002 leaves the NUMA stuff in src/port (i.e. it's no longer moved to
> src/backen
On 4/8/25 15:06, Andres Freund wrote:
> Hi,
>
> On 2025-04-08 17:44:19 +0500, Kirill Reshke wrote:
>> On Mon, 7 Apr 2025 at 23:00, Tomas Vondra wrote:
>>> I'll let the CI run the tests on it, and
>>> then will push, unless someone has more comments.
>
On 4/8/25 15:06, Andres Freund wrote:
> Hi,
>
> On 2025-04-08 17:44:19 +0500, Kirill Reshke wrote:
>> On Mon, 7 Apr 2025 at 23:00, Tomas Vondra wrote:
>>> I'll let the CI run the tests on it, and
>>> then will push, unless someone has more comments.
>
On 4/8/25 16:59, Andres Freund wrote:
> Hi,
>
> On 2025-04-08 09:35:37 -0400, Andres Freund wrote:
>> On April 8, 2025 9:21:57 AM EDT, Tomas Vondra wrote:
>>> On 4/8/25 15:06, Andres Freund wrote:
>>>> On 2025-04-08 17:44:19 +0500, Kirill Reshke wro
> The attached small patch fixes the manual.
>
Thank you for noticing this and for the fix! Pushed.
This also reminded me we agreed to change page_num to bigint, which I
forgot to change before commit. So I adjusted that too, separately.
regards
--
Tomas Vondra
On 4/7/25 17:51, Andres Freund wrote:
> Hi,
>
> On 2025-04-06 13:56:54 +0200, Tomas Vondra wrote:
>> On 4/6/25 01:00, Andres Freund wrote:
>>> On 2025-04-05 18:29:22 -0400, Andres Freund wrote:
>>>> I think one thing that the docs should mention is that callin
On 4/7/25 23:50, Jakub Wartak wrote:
> On Mon, Apr 7, 2025 at 11:27 PM Tomas Vondra wrote:
>>
>> Hi,
>>
>> I've pushed all three parts of v29, with some additional corrections
>> (picked lower OIDs, bumped catversion, fixed commit messages).
>
> H
Hi,
I've pushed all three parts of v29, with some additional corrections
(picked lower OIDs, bumped catversion, fixed commit messages).
On 4/7/25 23:01, Jakub Wartak wrote:
> On Mon, Apr 7, 2025 at 9:51 PM Tomas Vondra wrote:
>
>>> So it looks like that the new way to it
On 4/7/25 20:11, Bertrand Drouvot wrote:
> Hi,
>
> On Mon, Apr 07, 2025 at 12:42:21PM -0400, Andres Freund wrote:
>> Hi,
>>
>> On 2025-04-07 18:36:24 +0200, Tomas Vondra wrote:
>>
>> I was thinking of checking if the BufferDesc indicates BM_VALID or
>&g
ent patches are good enough
>> for PG18, with the current behavior, and then maybe improve that in
>> PG19.
>
> I think as long as the docs mention this with or it's ok for
> now.
>
OK, I'll add a warning explaining this.
regards
--
Tomas Vondra
ssion tests can't tell us much,
considering it didn't fail once with the reverted patch :-(
I did check the coverage in:
https://coverage.postgresql.org/src/backend/utils/hash/dynahash.c.gcov.html
and sure enough, dir_realloc() is not executed once. And there's a
couple more p
in os_page_status.
I intend to push 0001 and 0002 shortly, and 0003 after a bit more review
and testing, unless I hear objections.
regards
--
Tomas Vondra
From fcc4fc2ada33cbbc962d561ddeea6966f0d55492 Mon Sep 17 00:00:00 2001
From: Jakub Wartak
Date: Wed, 2 Apr 2025 12:29:22 +0200
Subject: [P
;> pages.
>>> + * It's a bit misleading to call that "aligned", no? */
>>> +
>>> + /* Get number of OS aligned pages */
>>> + shm_ent_page_count
>>> + = TYPEALIGN(os_page_size, ent->allocated_size) /
>>> os_page_size;
>>> +
>>> + /*
>>> + * If we get ever 0xff back from kernel inquiry, then we
>>> probably have
>>> + * bug in our buffers to OS page mapping code here.
>>> + */
>>> + memset(pages_status, 0xff, sizeof(int) * shm_ent_page_count);
>>
>> There's obviously no guarantee that shm_ent_page_count is a multiple of
>> os_page_size. I think it'd be interesting to show in the view when one shmem
>> allocation shares a page with the prior allocation - that can contribute a
>> bit
>> to contention. What about showing a start_os_page_id and end_os_page_id or
>> something? That could be a feature for later though.
>
> I was thinking about it, but it could be done when analyzing this
> together with data from pg_shmem_allocations(?) My worry is timing :(
> Anyway, we could extend this view in future revisions.
>
I'd leave this out for now. It's not difficult, but let's focus on the
other issues.
>>> +SELECT NOT(pg_numa_available()) AS skip_test \gset
>>> +\if :skip_test
>>> +\quit
>>> +\endif
>>> +-- switch to superuser
>>> +\c -
>>> +SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa;
>>> + ok
>>> +
>>> + t
>>> +(1 row)
>>
>> Could it be worthwhile to run the test if !pg_numa_available(), to test that
>> we do the right thing in that case? We need an alternative output anyway, so
>> that might be fine?
>
> Added. the meson test passes, but I'm sending it as fast as possible
> to avoid a clash with Tomas.
>
Please keep working on this. I may hava a bit of time in the evening,
but in the worst case I'll merge it into your patch.
regards
--
Tomas Vondra
he current backend, so I'd bet people
would not be happy with NULL, and would proceed to force the allocation
in some other way (say, a large query of some sort). Which obviously
causes a lot of other problems.
I can imagine having a flag that makes the allocation optional, but
there's no convenient way to pass that to a view, and I think most
people want the allocation anyway.
Especially for monitoring purposes, which usually happens in a new
connection, so the backend has little opportunity to allocate the pages
"naturally."
regards
--
Tomas Vondra
at right now, but at the very least we ought to
> document it.
>
+1 to documenting this
>
> On 2025-04-05 16:33:28 +0200, Tomas Vondra wrote:
>> The libnuma library is not available on 32-bit builds (there's no shared
>> object for i386), so we disable it in that
fields. Seems a bit weird, but we always did that - the patch
does not really change that.
I'll now mark this as committed. I haven't done about the alignment. My
conclusion from the discussion was we don't quite need to do that, but
if we do I think it's a matter for a separate patch - perhaps something
like the 0003.
Thanks for the patch, reviews, etc.
--
Tomas Vondra
On 3/24/25 16:25, Heikki Linnakangas wrote:
> On 24/03/2025 16:56, Tomas Vondra wrote:
>>
>>
>> On 3/23/25 17:43, Heikki Linnakangas wrote:
>>> On 21/03/2025 17:16, Andres Freund wrote:
>>>> Am I right in understanding that the only scenario (w
On 4/5/25 15:23, Tomas Vondra wrote:
> On 4/5/25 11:37, Bertrand Drouvot wrote:
>> Hi,
>>
>> On Fri, Apr 04, 2025 at 09:25:57PM +0200, Tomas Vondra wrote:
>>> OK,
>>>
>>> here's v25 after going through the patches once more, fixing the issues
&
On 4/5/25 11:37, Bertrand Drouvot wrote:
> Hi,
>
> On Fri, Apr 04, 2025 at 09:25:57PM +0200, Tomas Vondra wrote:
>> OK,
>>
>> here's v25 after going through the patches once more, fixing the issues
>> mentioned by Bertrand, etc.
>
> Thanks!
>
&
code gets multiple loops in
while (wpos < file->nbytes)
{
...
}
because bytestowrite will be the value from the last loop? I haven't
tried, but I guess writing wide tuples (more than 8k) might fail.
regards
--
Tomas Vondra
in the function comment, but I'm
also not quite sure I understand what "output shared memory" is ...
regards
--
Tomas Vondra
From 381c5077592e38dbcbbf6acc4f1e86a767a92957 Mon Sep 17 00:00:00 2001
From: Jakub Wartak
Date: Wed, 2 Apr 2025 12:29:22 +0200
Subject: [PATCH v25 1/5]
Yes, I agree.
regards
--
Tomas Vondra
On 4/4/25 08:50, Bertrand Drouvot wrote:
> Hi,
>
> On Thu, Apr 03, 2025 at 08:53:57PM +0200, Tomas Vondra wrote:
>> On 4/3/25 15:12, Jakub Wartak wrote:
>>> On Thu, Apr 3, 2025 at 1:52 PM Tomas Vondra wrote:
>>>
>>>> ...
>>>>
>>&
On 4/4/25 09:35, Jakub Wartak wrote:
> On Fri, Apr 4, 2025 at 8:50 AM Bertrand Drouvot
> wrote:
>>
>> Hi,
>>
>> On Thu, Apr 03, 2025 at 08:53:57PM +0200, Tomas Vondra wrote:
>>> On 4/3/25 15:12, Jakub Wartak wrote:
>>>>
On 4/3/25 15:12, Jakub Wartak wrote:
> On Thu, Apr 3, 2025 at 1:52 PM Tomas Vondra wrote:
>
>> ...
>>
>> So unless someone can demonstrate a use case where this would matter,
>> I'd not worry about it too much.
>
> OK, fine for me - just 3 cols for p
On 4/3/25 10:23, Bertrand Drouvot wrote:
> Hi,
>
> On Thu, Apr 03, 2025 at 09:01:43AM +0200, Jakub Wartak wrote:
>> On Wed, Apr 2, 2025 at 6:40 PM Tomas Vondra wrote:
>>
>> Hi Tomas,
>>
>>> OK, so you agree the commit messages are complete / correct
On 4/3/25 09:01, Jakub Wartak wrote:
> On Wed, Apr 2, 2025 at 6:40 PM Tomas Vondra wrote:
>
> Hi Tomas,
>
>> OK, so you agree the commit messages are complete / correct?
>
> Yes.
>
>> OK. FWIW if you disagree with some of my proposed changes, feel free to
&
On 4/2/25 17:45, Peter Geoghegan wrote:
> On Wed, Apr 2, 2025 at 11:36 AM Tom Lane wrote:
>> Ouch! I had no idea it had gotten that big. Yeah, we ought to
>> do something about that.
>
> Tomas Vondra talked about this recently, in the context of his work on
> prefe
On 4/2/25 18:43, Andres Freund wrote:
> Hi,
>
> On 2025-03-04 20:50:43 +0100, Tomas Vondra wrote:
>> I pushed the two smaller parts today.
>>
>> Here's the remaining two parts, to keep cfbot happy. I don't expect to
>> get these into PG18, though.
>
On 4/2/25 16:46, Jakub Wartak wrote:
> On Tue, Apr 1, 2025 at 10:17 PM Tomas Vondra wrote:
>>
>> Hi,
>>
>> I've spent a bit of time reviewing this. In general I haven't found
>> anything I'd call a bug, but here's a couple comments for v1
inters like this etc.).
11) This could use UINT64_FORMAT, instead of a cast:
elog(DEBUG1, "NUMA: os_page_count=%lu os_page_size=%zu
pages_per_blk=%.2f",
(unsigned long) os_page_count, os_page_size, pages_per_blk);
regards
--
Tomas Vondra
From 46a7801b1985a81bb8bc35fcfb2cbb74e6ea5
t;number of elements", but it's a simple flag. So
I renamed it to "prealloc", which seems clearer to me. I also tweaked
(reordered/reformatted) the conditions a bit.
For the other patch, I realized we can simply MemSet() the whole chunk,
instead of resetting the individual parts
On 3/30/25 06:04, Tom Lane wrote:
> Tomas Vondra writes:
>> I've pushed all the parts of this patch series, except for the stress
>> test - which I think was not meant for commit.
>> buildfarm seems happy so far, except for a minor indentation issue
>> (forg
On 3/28/25 20:51, Kirill Reshke wrote:
> On Fri, 28 Mar 2025 at 21:26, Tomas Vondra wrote:
>>
>> Here's a polished version of the patches. If you have any
>> comments/objections, please speak now.
>> --
>> Tomas Vondra
>
> Hi, no objections, lgtm
callee
functions. So now it's
- amcheck consistency check context
- posting tree check context
regards
--
Tomas Vondra
From 28b392b687f641b09bc79bb3bb3e61505845e6c1 Mon Sep 17 00:00:00 2001
From: Tomas Vondra
Date: Fri, 28 Mar 2025 16:49:04 +0100
Subject: [PATCH v20250328 1/6] Fix grammar in
> */
>
> hash_create code is confusing because the nelem_alloc named variable is used
> in two different cases, In the above case nelem_alloc refers to the one
> returned by choose_nelem_alloc function.
>
> The other nelem_alloc determines the number of elements in each partition
> for a partitioned hash table. This is not what is being referred to in
> the above
> comment.
>
> The bit "For more explanation see comments within this function" is not
> great, if only because there are not many comments within the function,
> so there's no "more explanation". But if there's something important, it
> should be in the main comment, preferably.
>
>
> I will improve the comment in the next version.
>
OK. Do we even need to pass nelem_alloc to hash_get_init_size? It's not
really used except for this bit:
+if (init_size > nelem_alloc)
+element_alloc = false;
Can't we determine before calling the function, to make it a bit less
confusing?
regards
--
Tomas Vondra
On 3/27/25 13:56, Tomas Vondra wrote:
> ...
>
> OK, I don't have any other comments for 0001 and 0002. I'll do some
> more review and polishing on those, and will get them committed soon.
>
Actually ... while polishing 0001 and 0002, I noticed a couple more
details t
On 3/27/25 16:30, Mark Dilger wrote:
>
>
> On Fri, Feb 21, 2025 at 6:29 AM Tomas Vondra <mailto:to...@vondra.me>> wrote:
>
> Hi,
>
> I see this patch didn't move since December :-( I still think these
> improvements would be useful, it
his.
> Do you have any suggestions in mind?
>
> Please find attached updated patches after merging all your review
> comments except
> a few discussed above.
>
OK, I don't have any other comments for 0001 and 0002. I'll do some
more review and polishing on those, and will get them committed soon.
I don't plan to push 0003, unless someone can actually explain and
demonstrate the benefits of the proposed padding,
regards
--
Tomas Vondra
ards
Tomas
On 3/13/25 17:37, Tomas Vondra wrote:
> Hi all,
>
> pgconf.dev 2025 will host "Advanced Patch Feedback Session", with the
> same format as in 2024 [1]:
>
> Participants will work in small groups with a Postgres committer to
> analyze a past contribut
x27;s be tidy and fix both latestCompletedXid and
> xactCompletionCount.
>
Thanks for looking into this and pushing the fix.
Would it make sense to add a comment documenting this reasoning about
not handling aborts? Otherwise someone will get to rediscover this in
the future ...
regards
--
Tomas Vondra
ntion can we
get there? I don't get it.
Also, why is the patch adding padding after statusFlags (the last array
allocated in InitProcGlobal) and not between allProcs and xids?
regards
--
Tomas Vondra
From f527909dda02b4c7231db53a0fe6cecbaec55ca4 Mon Sep 17 00:00:00 2001
From: Rahila Sye
On 3/19/25 13:27, Tomas Vondra wrote:
> On 3/19/25 08:17, Heikki Linnakangas wrote:
>> On 19/03/2025 04:22, Tomas Vondra wrote:
>>> I kept stress-testing this, and while the frequency massively increased
>>> on PG18, I managed to reproduce this all the way back to
On 3/19/25 08:17, Heikki Linnakangas wrote:
> On 19/03/2025 04:22, Tomas Vondra wrote:
>> I kept stress-testing this, and while the frequency massively increased
>> on PG18, I managed to reproduce this all the way back to PG14. I see
>> ~100x more corefiles on PG18.
>>
7;t have the same issue. None of them seems to advance the XID to 209508.
regards
--
Tomas Vondra
On 3/17/25 13:18, Thomas Munro wrote:
> On Tue, Mar 18, 2025 at 12:59 AM Tomas Vondra wrote:
>> On 3/17/25 12:36, Tomas Vondra wrote:
>>> I'm still fiddling with the script, trying to increase the probability
>>> of the (apparent) race condition. On one machine
On 3/17/25 12:36, Tomas Vondra wrote:
> ...
>
> I'm still fiddling with the script, trying to increase the probability
> of the (apparent) race condition. On one machine (old Xeon) I can hit it
> very easily/reliably, while on a different machine (new Ryzen) it's ver
g to increase the probability
of the (apparent) race condition. On one machine (old Xeon) I can hit it
very easily/reliably, while on a different machine (new Ryzen) it's very
rare. I don't know if that's due to difference in speed of the CPU, or
fewer cores, ... I guess it changes the timing just enough.
I've also tried running the stress test on PG17, and I'm yet to see a
single failure there. Not even on the xeon machine, that hits it
reliably on 18. So this seems to be a PG18-only issue.
If needed, I can try adding more logging, or test a patch.
regards
--
Tomas Vondra
a while to hit it - on my laptop it takes an hour or so, but I
guess it's more about the random sleeps in the script.
I've only ever seen this on the standby, never on the primary.
regards
--
Tomas Vondra
Program terminated with signal SIGABRT, Aborted.
#0 __pthread_kill_implementa
1 - 100 of 1766 matches
Mail list logo