0x000102ae89d4 RecordNewMultiXact +
576
So it makes me think that it's some version of IO concurrency issue.
As expected error only persists if "extend SLRU" branch is active in
RecordNewMultiXact().
Thanks for testing!
Best regards, Andrey Borodin.
ts()".
FWIW we diagnosed that problem in the recent thread "IPC/MultixactCreation on
the Standby server" [0].
Proposed fix is non-trivial. I'm trying to asses how many reports we had about
this problem. Did you ever observe this problem again?
Thanks!
Best regards, Andrey Boro
> On 27 Jul 2025, at 16:53, Andrey Borodin wrote:
>
> we have to do this "next offset" dance on Primary too.
PFA draft of this.
I also attach a version for PG17, maybe Dmitry could try to reproduce the
problem with this patch. I think the problem should be fixed by the pat
> On 26 Jul 2025, at 22:44, Álvaro Herrera wrote:
>
> On 2025-Jul-25, Andrey Borodin wrote:
>
>> Also I've discovered one more serious problem.
>> If a backend crashes just before WAL-logging multi, any heap tuple
>> that uses this multi will become unreada
> On 21 Jul 2025, at 19:58, Andrey Borodin wrote:
>
> I'm planning to prepare tests and fixes for all supported branches
This is a status update message. I've reproduced problem on REL_13_STABLE and
verified that proposed fix works there.
Also I've discovered one mo
> On 18 Jul 2025, at 18:53, Andrey Borodin wrote:
>
> Please find attached dirty test and a sketch of the fix. It is done against
> PG 16, I wanted to ensure that problem is reproducible before 17.
Here'v v7 with improved comments and cross-check for correctness.
Also, Mul
>
> Anyway, he's going to try and implement this.
>
> Andrey, please let me know if I misunderstood the idea.
Please find attached dirty test and a sketch of the fix. It is done against PG
16, I wanted to ensure that problem is reproducible before 17.
Best regards, Andrey Boro
> On 30 Jun 2025, at 15:58, Andrey Borodin wrote:
>
> page_collect_tuples() holds a lock on the buffer while examining tuples
> visibility, having InterruptHoldoffCount > 0. Tuple visibility check might
> need WAL to go on, we have to wait until some next MX be filled in.
see a
reason to restrict wal_compression_threshold by lower value.
PFA rebased version.
Best regards, Andrey Borodin.
v4-0001-Compress-big-WAL-records.patch
Description: Binary data
es, we can repair the VM by assuming heap to
be the source of truth in this case. But we must also emit
ERRCODE_DATA_CORRUPTED XX001 code into the logs. In many cases this will alert
on-call SRE.
To do so I propose to replace elog(WARNING,...) with
ereport(WARNING,(errcode(ERRCODE_DATA_CORRUPTED),..).
Best regards, Andrey Borodin.
be connected to the fact that we log VM changes
independently of data changes that caused VM to change. But I have no real
evidence or understanding what happened.
Best regards, Andrey Borodin.
> On 30 Jun 2025, at 16:34, Andrey Borodin wrote:
>
> Please find attached two new steps for amcheck:
> 1. A function to verify GiST integrity. This patch is in decent shape, simply
> rebased from previous year.
> 2. Support on pg_amcheck's side for this function. Thi
microseconds.
Best regards, Andrey Borodin.
> On 2 Jul 2025, at 18:38, Maxim Orlov wrote:
>
> If you
> know of any real problems, please tell me
If I understood correctly, pages can differ on primary and standby. That might
be problematic for WAL debug tests.
Best regards, Andrey Borodin.
our analysis is correct. We could make common checking function and
argument was not needed.
>
> The attached patch removes this unused typedef to clean up the dead code.
> Thoughts?
Looks good to me. Thanks for fixing this!
Best regards, Andrey Borodin.
. Support on pg_amcheck's side for this function. This patch did not receive
such review attention before. And, perhaps, should be extended to support
existing GIN functions.
I'll put this thread into July commitfest.
Thanks!
Best regards, Andrey Borodin.
[0] https://postgr.es/m/45AC9B0A
> On 28 Jun 2025, at 21:24, Andrey Borodin wrote:
>
> This seems to be fixing issue for me.
ISTM I was wrong: there is a possible recovery conflict with snapshot.
REDO:
frame #2: 0x00010179a0c8 postgres`pg_usleep(microsec=100) at
pgsleep.c:50:10
frame #3: 0x000
> On 28 Jun 2025, at 00:37, Andrey Borodin wrote:
>
> Indeed.
After some experiments I could get unstable repro on my machine.
I've added some logging and that's what I've found:
2025-06-28 23:03:40.598 +05 [40887] 006_MultiXact_standby.pl WARNING: Timed
out: n
o this TAP test, we could make it
portable. So I can debug the problem on my machine...
Either way we can proceed with remote debugging via mailing list :)
Thank you!
Best regards, Andrey Borodin.
v3-0001-Make-next-multixact-sleep-timed-with-debug-loggin.patch
Description: Binary data
v3-0
ty is available via hacks, just a little more
work).
But are you going to backpatch all new features of injection points in future?
It's potentially x6 more work.
Thanks!
Best regards, Andrey Borodin.
> On 26 Jun 2025, at 17:59, Andrey Borodin wrote:
>
> hypothesis
Dmitry, can you please retry your reproduction with attached patch?
It must print nextMXact and tmpMXact. If my hypothesis is correct nextMXact
will precede tmpMXact.
Best regards, Andrey Borodin.
v2-0001-
act is not filled often enough from
redo pathes. So if you are unlucky enough, corner case 2 reading can deadlock
with startup.
I need to verify it further, but if so - I's an ancient bug that just happens
to be few orders of magnitude more reproducible on 17 due to performance
improvements. Still a hypothetical though.
Best regards, Andrey Borodin.
llback to
behavior of PG 16.
Best regards, Andrey Borodin.
0001-Make-next-multixact-sleep-timed.patch
Description: Binary data
> On 21 Jun 2025, at 21:10, Mihail Nikalayeu wrote:
>
> Rebased
IMO the patch is RfC, I've just updated the status of the CF iteam.
Thanks.
Best regards, Andrey Borodin.
er explained this code before somewhere in pgsql-hackers.
And the reasoning was something like "if you lack a tuple in unquie constraints
- it's almost certainly subsequent constrain violation and data loss". But I'm
not sure.
And I could not find this discussion in archives.
Best regards, Andrey Borodin.
er(state->revmapbuf, BUFFER_LOCK_UNLOCK);
// more usage of state->revmapbuf
+ ReleaseBuffer(state->revmapbuf);
I hope you know what you are doing. BRIN concurrency is not known to me at all.
That's all for first pass through patches. Thanks for working on it!
Best regards, Andrey Borodin.
iST check. In GiST "Area of responsibility" of
internal tuple can be extended in any direction. That's why we need to lock
parent page.
If in GIN internal tuple keyspace is never extended - it's OK to avoid
gin_refind_parent().
But reasoning about GIN concurrency is rather difficult. Unfortunately, we do
not have such checks in B-tree verification without ShareLock. Either way we
could peep some idea from there.
Thank you!
Best regards, Andrey Borodin.
d when IPs
are persisted.
Another idea of improvement is using distinguishable errors in
injection_shmem_startup(). Like differentiating between read error and wrong
magic number.
But there's no big value in these improvements, so the patch is fine as is too.
Best regards, Andrey Borodin.
> On 21 May 2025, at 15:03, Fujii Masao wrote:
>
>
>
> On 2025/05/21 17:35, Andrey Borodin wrote:
>> Well, we implemented this and made tests that do a lot of failovers. These
>> tests observed data loss in some infrequent cases due to wrong new primary
>>
ed?
Besides these, cool new abilities and a test for a bug, looks good to me.
Best regards, Andrey Borodin.
> On 13 May 2025, at 14:13, Fujii Masao wrote:
>
>
>
> On 2025/05/13 0:47, Andrey Borodin wrote:
>> Moved off from "Small fixes needed by high-availability tools"
>>> On 12 May 2025, at 01:33, Amit Kapila wrote:
>>>
>>> On Fri,
synchronous
>> standby.
>>
>
> Sounds reasonable to me. Let us see what others think about it.
I think this LSN is simply LSN where crash recovery ends...
Best regards, Andrey Borodin.
Moved off from "Small fixes needed by high-availability tools"
> On 12 May 2025, at 01:33, Amit Kapila wrote:
>
> On Fri, May 2, 2025 at 6:30 PM Andrey Borodin wrote:
>>
>> 3. Allow reading LSN written by walreciever, but not flushed yet
>>
>> Pr
> On 6 May 2025, at 12:00, Matthias van de Meent
> wrote:
>
> On Fri, 2 May 2025 at 15:00, Andrey Borodin wrote:
>>
>> Hi hackers!
>>
>> I want to revive attempts to fix some old edge cases of physical quorum
>> replication.
>>
>> Ple
> On 15 Apr 2025, at 11:47, Andrey Borodin wrote:
>
> I’m going to print their posters on May 8th before my flight to Canada.
Following posters were printed, packaged and prepared to accompany me on a
flight to Montreal.
1. Logging Plan of the Running Query (Atsushi Torikoshi
failover pg_consul uses LSNs from lwaldump.
This approach works well, but is cumbersome.
There are other caveats of replication, but IMO these 3 problems are most
annoying in terms of data durability.
I'd greatly appreciate any thoughts on this.
Best regards, Andrey Borodin.
[0]
htt
be persistent.
Though they will appear on standby, which is, probably, not expected
functionality...
Best regards, Andrey Borodin.
> On 15 Apr 2025, at 11:47, Andrey Borodin wrote:
>
> Thank you for your questions!
As of today, 12 people expressed interest and 6 asked for printing assistance.
I think I can print and bring like ~20 posters. The printing capacity
utilization is 30% :)
So, even if you do no
LWLockAcquire(pgss->lock, LW_SHARED) somewhere (do
you have a backtrace?), which prevent interrupts anyway.
Thanks!
Best regards, Andrey Borodin.
n a less efficient fill loop.
Well, that's just few hundred bytes at most. But I agree that makes sense.
Best regards, Andrey Borodin.
this maybe use MemoryContextAllocZero() instead of subsequent
MemSet()?
But this might unroll loop of unnecessary beautifications like DynaHashAlloc()
calling Assert(MemoryContextIsValid(CurrentDynaHashCxt)) just before
MemoryContextAllocExtended() will repeat same exercise.
Best regards, Andrey Borodin.
there are a lot of cases of MCXT_ALLOC_NO_OOM, perhaps should we check
them all?
Best regards, Andrey Borodin.
stgreSQL and others purely DCS nodes will
reduce waste of resources”?
Best regards, Andrey Borodin.
> On 16 Apr 2025, at 10:39, Kirill Reshke wrote:
>
> You can run bash from extension, what's the point?
You cannot run bash that will stop backend running bash.
Best regards, Andrey Borodin.
> On 16 Apr 2025, at 09:26, Tom Lane wrote:
>
> Andrey Borodin writes:
>> I think it's what Konstantin is proposing. To have our own Raft
>> implementation, without dependencies.
>
> Hmm, OK. I thought that the proposal involved relying on some existing
&
one-CPU hosts for Zookeper\etcd.
If you use build-in failover you have to resort to 3 big Postgres machines
because you need 2/3 majority. Of course, you can install MySQL-stype arbiter -
host that had no real PGDATA, only participates in voting. But this is a
solution to problem induced by built-in autofailover.
Best regards, Andrey Borodin.
nce properties. I'd start to design from here, not
from Raft paper.
Best regards, Andrey Borodin.
> On 12 Apr 2025, at 01:26, Jonathan S. Katz wrote:
>
> Please join us in wishing Jacob much success and few reverts!
Congratulations, Jacob!
Best regards, Andrey Borodin.
hink I’ll create a Telegram chat for quick feedback and so that I
can show how posters look on the wall.
>
> On Sun, Apr 13, 2025 at 04:01:56PM +0500, Andrey Borodin wrote:
>> The goal of the poster session is to visually present your patch or project
>> on an A2-sized poster (ANS
ee from your data, rint() is consistent across OSes. Can user
observe any inconsistency caused by rint() behavior in PostgreSQL?
Thanks!
Best regards, Andrey Borodin.
ing
collaboration: attracting coauthors, reviewers, maybe committers, improving the
overall discussion, etc. Let me know if you would like to present your poster
or have a question.
Thank you for your ongoing contributions to PostgreSQL, and I look forward to
hearing from you.
Best rega
> On 3 Apr 2025, at 16:13, Heikki Linnakangas wrote:
>
> Committed
Cool! Thank you!!!
Best regards, Andrey Borodin.
coincide for a false failure.
Thank you!
Best regards, Andrey Borodin.
discussed here [1]
2. '+ERROR: tuple concurrently deleted' in injection_points/isolation seems to
be discussed here [2]
Thanks!
Best regards, Andrey Borodin.
[0]
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-21%2019%3A09%3A59
[1] https://www.postg
e posting list and streamify this as well.
>
> It's probably not worth it -- since we process the pending list for
> each page of the index.
My understanding is that pending lists should be small on real workloads.
Thank you!
Best regards, Andrey Borodin.
> On 21 Mar 2025, at 05:54, Melanie Plageman wrote:
>
> On Wed, Mar 19, 2025 at 5:26 AM Andrey Borodin wrote:
>>
>> So, yes, your change to the test seems correct to me. We can do the test
>> with just one injection point.
>
> Attached 0001 is what I pl
safe, due to wait_counts in InjectionPointSharedState.
So, yes, your change to the test seems correct to me. We can do the test with
just one injection point.
Best regards, Andrey Borodin.
essed, not before.
FWIW I do not insist on committing the test, it was mostly necessary to
validate that backtracking still works. I could not check it manually. But
other injection tests, surprisingly, seem to be stable enough across buildfarm.
Best regards, Andrey Borodin.
v9-0001
> On 12 Mar 2025, at 20:02, Evgeny Voropaev wrote:
>
v6 looks good to me. I'll flip the CF entry.
Thanks!
Best regards, Andrey Borodin.
if this function would better
belong to SLRU than common XLog stuff.
Besides this patch seems ready to me.
Thanks!
Best regards, Andrey Borodin.
nditionVariableCancelSleep();
Won’t this sleep wait forever?
I see about 20 other occurrences of similar code, so, perhaps, everything is
fine. But I would greatly appreciate a little pointers on why it works.
Best regards, Andrey Borodin.
when elog() was
near, but now IMO we can have few words about what is going on.
Best regards, Andrey Borodin.
> On 26 Feb 2025, at 00:34, Maksim.Melnikov wrote:
>
> In applied patch I removed spinlock release in if clause.
Looks like the oversight in 9d9b9d4. IMO the fix is correct.
Best regards, Andrey Borodin.
lot of time ironing out various false positives from GIN check.
Kirill, what is your opinion about GIN verification? Does it look complete? (in
a sense that it will not trigger false alarm, certainly it cannot catch all the
type of corruptions)
Thanks!
Best regards, Andrey Borodin.
ctMemberPage, ZeroSUBTRANSPage) it means
just a call to SimpleLruZeroPage().
I think we can safely replace
+ slotno = (*zerofunc)(pageno, false);
with
+ slotno = SimpleLruZeroPage(pageno);
Thus we will not need zerofunc argument at all.
Thanks!
Best regards, Andrey Borodin.
conn->check_all_addrs[0] == '1')
Let's make it like load balancing is done [4].
Finally, let's think about naming alternatives for "check_all_addrs".
I think that's enough for a first round of the review. If it's not a bug, but a
feature - it's a
uuid = generate_uuidv7(us / US_PER_MS, (us % US_PER_MS) *
> NS_PER_US + ns % NS_PER_US);
>
> Need to update comments in uuidv7_internval() such as:
>
>/*
> * Shift the current timestamp by the given interval. To calculate time
> * shift correctly, we convert t
> On 7 Feb 2025, at 02:05, Tom Lane wrote:
>
> Do you have any further comments on this patch?
No, all steps of the patch set look good to me.
Best regards, Andrey Borodin.
p paramarg2.
Best regards, Andrey Borodin.
[0]
https://github.com/wanglinn/xadb/blob/7695b7edcb0a89f3173b648c0da5b953538f2aa9/src/backend/pgxc/pool/execRemote.c#L835
[1]
https://github.com/babelfish-for-postgresql/babelfish_extensions/blob/376cf488804fa02f9b1db5bbfbe74e98627fe96c/contrib/babelfishpg_tsql/src/pl_exec.c#L8030
> On 3 Feb 2025, at 22:36, Tom Lane wrote:
>
> I'm not wedded to that name; do you have a better idea?
I'd propose something like attached. But feel free to ignore my suggestion: I
do not understand context of these structure members.
Best regards, Andrey Borodin.
rena
the rest of the patch set.
(Well, maybe paramarg2 resonates a bit, just from similarity with varchar2)
ecpg tests seem to fail on Windows[0], but looks like it's not related to this
thread.
Best regards, Andrey Borodin.
[0] https://cirrus-ci.com/task/4835794898124800
0:53.085+05 | 5001 years
>> 6001 | 8026-01-31 12:00:53.085+05 | 6001 years
>> 7001 | 9026-01-31 12:00:53.085+05 | 7001 years
>> 8001 | 10026-01-31 12:00:53.085+05 | 8001 years
>> (9 rows)
or maybe something simple like
with u as (select uuidv7() id) select uuid_extract_timestamp(uuidv7('-09-09
12:34:56.789+05' - uuid_extract_timestamp(u.id))) from u;
But it would still be flaky, second call to uuidv7() can overflow a millisecond.
Thanks!
Best regards, Andrey Borodin.
v2-0001-UUDv7-fix-offset-computations-in-dates-after-2262.patch
Description: Binary data
diagnostics in case of
production incidents.
Thanks!
Best regards, Andrey Borodin.
v2-0001-Print-backtrace-on-SIGABRT-SIGBUS-SIGSEGV.patch
Description: Binary data
01-31 12:00:53.085+05 | 5001 years
6001 | 8026-01-31 12:00:53.085+05 | 6001 years
7001 | 9026-01-31 12:00:53.085+05 | 7001 years
8001 | 10026-01-31 12:00:53.085+05 | 8001 years
(9 rows)
Best regards, Andrey Borodin.
0001-UUDv7-fix-offset-computations-in-dates-after-2262.patch
Description: Binary data
shape before feature freeze.
The chances of getting currently proposed approach to v18 seems slim either...
I'm hesitating to register this patch on the CF. What do you think?
Best regards, Andrey Borodin.
v3-0001-Compress-big-WAL-records.patch
Description: Binary data
xtra
argument of generate_uuidv7()? Or, perhaps, leave things as they stand now?
Thanks!
Best regards, Andrey Borodin.
ery good idea to retry promotion after returning online. The
user will get unexpected splitbrain.
Best regards, Andrey Borodin.
of ExprEvalStep. But void *paramarg and his friend void
*paramarg2 are cryptic. They always have same type and same meaning, but have
very generic names.
I wonder if you plan similar optimizations for array_cat(), array_remove() etc?
+ a := a || a; -- not optimizable
Why is it not optimizable? Because there is no support function, because
array_cat() has no support function, or something else?
Besides this, the patch looks good to me.
Best regards, Andrey Borodin.
ve this optimization? Is it O(numer_of_arguments) of for every
assignment execution?
Thanks!
Best regards, Andrey Borodin.
>
There’s a typo in the commit message (ratio instead of rate). Besides this the
patch looks ready for committer.
Best regards, Andrey Borodin.
nique, similar to v9.
Looks good to me.
Nice stats for some cleaning up 34 insertions(+), 48 deletions(-).
Best regards, Andrey Borodin.
hanks!
Also seems like I forgot to bump WAL_FILE_MAGIC…
What do you think about proposed approach?
Best regards, Andrey Borodin.
> On 19 Dec 2024, at 20:48, Yura Sokolov wrote:
>
> Here's version with type change bits16 -> uint16
Thanks! This version looks good to me. I’ll mark the CF entry as RfC.
Best regards, Andrey Borodin.
st entries [0,1], where clog file corruption
was discussed. See Emails section.
Thanks!
Best regards Andrey Borodin.
[0] https://commitfest.postgresql.org/16/1462/
[1] https://commitfest.postgresql.org/51/4709/
act log file.
Best regards, Andrey Borodin.
nd clog file missing, when database restart, it will
> try to recover. And everything is ok
>
> So I think we may improve the database more reliable in some scenarios, e.g.
> Only clog file corrupted or missing, like S1
I still do not get it. Why clog file would be missing?
Best regards, Andrey Borodin.
> On 23 Dec 2024, at 14:12, 章晨曦@易景科技 wrote:
>
> we simulate crash and clog file corrupt (delete the clog file)
Clog file cannot disappear as a result of a crash. What makes you think
otherwise?
Best regards, Andrey Borodin.
case of SLRU access.
+ bits16 nbanks;
Perhaps, it’s not bits anymore. Also, is 64K banks ought enough for everybody?
Best regards, Andrey Borodin.
tension might want to provide
> a generator that guarantees monotonicity across backends.
AFAIK extension pg_uuidv7 does not have this protection right now. But Florian
might add it in future.
Best regards, Andrey Borodin.
ld works. AFAIR
there's plenty of other tests to verify that.
Injection points seemed to me exactly the technogy that could help us to have
tests for this patch. But at this point It looks like it's reasonable to take
approach 1, as we did before.
Best regards, Andrey Borodin.
x27;uuidv7()' for now. We can
> rename it later if we find a better name.
I think uuidv7() is kind of consensual.
> I've attached the new version patch that incorporated all comments and
> renamed the functions. Also I avoided using 'if defined(__darwin__) ||
> defined(_MSC_VER)' twice.
Good, I think now it's a bit easier to understand those 2 bits.
Thanks!
Best regards, Andrey Borodin.
o fetch even after a call instruction? These cpus are really neat
things... so, probably, yes, it does.
Best regards, Andrey Borodin.
> On 31 Jan 2024, at 14:27, Japin Li wrote:
>
> LGTM.
>
> If there is no other objections, I'll change it to ready for committer
> next Monday.
I think we have a quorum, so I decided to go ahead and flipped status to RfC.
Thanks!
Best regards, Andrey Borodin.
> On 20 Jan 2024, at 08:31, vignesh C wrote:
>
> On Mon, 9 Jan 2023 at 09:49, Andrey Borodin wrote:
>>
>> On Tue, Jan 3, 2023 at 5:02 AM vignesh C wrote:
>>> does not apply on top of HEAD as in [1], please post a rebased patch:
>>>
>> Thanks!
thing like that actually happens.
If possible, I'd prefer one lock at a time, any maybe sometimes two-three with
some guarantees that this is safe.
So, from my POV first solution that you proposed seems much better to me.
Thanks for working on this!
Best regard, Andrey Borodin.
63c3741922a5
But again, UUIDs are not designed to store timestamp. They are unique and v7
promote data locality via time-ordering.
Best regards, Andrey Borodin.
> On 19 Jan 2024, at 13:25, Andrey Borodin wrote:
>
> Also, I've added some documentation on all functions.
Here's v12. Changes:
1. Documentation improvements
2. Code comments
3. Better commit message and reviews list
Best regards, Andrey Borodin.
v12-0001-Impl
the review.
Well, that was intentional. But now I see it's kind of confusing behaviour.
I've changed it to more expected version.
Also, I've added some documentation on all functions.
Best regards, Andrey Borodin.
v11-0001-Implement-UUID-v7-as-per-IETF-draft.patch
Description: Binary data
> On 18 Jan 2024, at 20:39, Andrey Borodin wrote:
>
> But 16455577420ns after 1582-10-15 00:00:00 UTC was 2022-02-22 19:22:22
> UTC. And that was 2022-02-23 00:22:22 in UTC-05.
'2022-02-22 19:22:22 UTC' is exactly that moment which was encoded into example
UU
. It is expected to be
generated on "Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00".
It's exaplained to be 16455577420ns after 1582-10-15 00:00:00 UTC.
But 16455577420ns after 1582-10-15 00:00:00 UTC was 2022-02-22 19:22:22
UTC. And that was 2022-02-23 00:22:22 in UTC-05.
Best regards, Andrey Borodin.
1 - 100 of 718 matches
Mail list logo