Re: [PROPOSAL] Doublewrite Buffer as an alternative torn page protection to Full Page Write

Robert Treat Tue, 03 Mar 2026 20:43:14 -0800

On Thu, Feb 26, 2026 at 2:14 AM 陈宗志 <[email protected]> wrote:
>
> Hi Robert,
>
> Thanks for the feedback and suggestions.
>
<snip>
> > So, I haven't looked at the code itself; tbh honest I am a bit too
> > paranoid to dive into generated code that would seem to carry some
> > likely level of legal risk around potential reuse of GPL/proprietary
> > code it might be based on (either in its original training, inference,
> > or context used for generation. Yeah, I know innodb isn't written in
> > C, but still). That said, I did have some feedback and questions on
> > the proposal itself, and some suggestions for how to move things
> > forward.
> >
> >> 0. The convention here is send the patches using:
> >>    git format-patch -v<VERSION> HEAD~<numberOfpatches>
> >>    for easier review. The 0003 probably should be out of scope. Anyway I've
> >>    attached all of those so maybe somebody else is going to take a
> >> look at them too,
> >>    they look very mature. Is this code used in production already 
> >> anywhere? (and
> >>    BTW the numbers are quite impressive)
> >>
> > While Jakub is right that the convention is to send patches, that
> > convention is based on a manual development model, not an agentic
> > development model. While there is no official project policy on this,
> > IMHO the thing we really need from you is not the code output, but the
> > prompts that were used to generate the code. There are plenty of folks
> > who have access to claude that could then use those prompts to
> > "recreate with enough proximity" the work you had claude do, and that
> > process would also allow for additional verification and reduction of
> > any legal concerns or concerns about investing further human
> > time/energy. (No offense, but as you are not a regular contributor,
> > you could analogize this to when third parties do large code dumps and
> > say "here's a contribution, it's up to you to figure out how to use
> > it". Ideally we want other folks to be able to pick up the project and
> > continue with it, even if it means recreating it, and that works best
> > if we have the underlying prompts).
> > The claude code configuration file is a good start, but certainly not
> > enough. Probably the ideal here would be full session logs, although a
> > developer-diary would probably also suffice. I'm kind of guessing here
> > because I don't know the scope of the prompts involved or how you were
> > interacting with Claude in order to get where you are now, but those
> > seem like the more obvious tools for work of this size whose intention
> > is to be open.
>
> Regarding the AI-generated code, the raw output from Claude was far
> from perfect. I have manually reviewed and modified the code
> extensively to get it to this state.
>
> Our plan is to first deploy and test this thoroughly on our own
> product, Alibaba Cloud RDS for PostgreSQL. Once we are confident that
> it is stable and issue-free, we intend to submit a formalized patch
> to the community. I am very much looking forward to discussing and
> reviewing the actual code with you all when the time comes.
>
> As for sharing the prompts or session logs, I personally feel they
> might not be as valuable as the final code itself. The generation
> process involved a lot of iterative, back-and-forth communication;
> the AI only knew how to make the right modifications after continuous
> human guidance, correction, and architectural decisions.
>


Yeah, this is an issue which doesn't seem like we have very good
answers to at the moment. Part of me thinks the right path is to
require completely open transcripts of this back and forth, like you
would see in a discussion of developers on the mailing list. OTOH,
lots of patches have gone through "pre-development" work before
hitting the mailing lists, not to mention that agents can operate in
ways that are so ridiculously verbose that the idea of having these
kinds of logs for every developer doesn't sound like it would scale,
if it even remained useful.

In any case, as you stated you were a former innodb developer, clearly
you would understand concerns about potential ip muddiness, so to that
end I decided to spin up my own agent to have it examine your patch vs
the innodb implementation and provide an analysis contrasting the two
implementations, and while no one should mistake that as something
official, the initial read through was comforting.

> > I would be helpful if you could provide a little more information on
> > the system you are running these benchmarks on, specifically for me
> > the underlying OS/Filesystem/hardware, and I'd even be interested in
> > the build flags. I'd also be interested to know if you did any kind of
> > crash safety testing... while it is great to have improved
> > performance, presumably that isn't actually the primary point of these
> > subsystems. It'd also be worth knowing if you tested this on any
> > systems with replication (physical or logical) since we'd need to
> > understand those potential downstream effects. I'm tempted to say you
> > should have an AI generate some pgbench scripts. Granted its early and
> > fine if you have done any of this, but I imagine we'll need to look at
> > it eventually.
>
> I have addressed the feedback and conducted comprehensive benchmarks
> comparing the three io_torn_pages_protection modes. Here are the
> detailed performance results and the system setup information you
> requested.
>
> Benchmark Setup:
> - Hardware: x86_64, Linux 5.10, NVMe SSD
> - PostgreSQL: 19devel (with DWB patch applied)
> - Tool: pgbench (TPC-B), 64 clients, 8 threads, 60 seconds per run
> - Common config: shared_buffers = 1GB, wal_level = replica
> - Three modes tested:
>   * io_torn_pages_protection = full_pages (traditional FPW)
>   * io_torn_pages_protection = double_writes (DWB size = 128MB)
>   * io_torn_pages_protection = off (no protection, baseline)
>
> Each test was run sequentially on the same machine to avoid I/O
> contention.
>
<snip>
> Analysis:
>
> The key factor is checkpoint frequency. FPW must write a full 8KB page
> image to WAL for every page's first modification after each checkpoint.
> When checkpoints are frequent:
>
<snip>
> When does DWB matter most?
> - Large active datasets that exceed shared_buffers
> - Frequent checkpoints (small max_wal_size or short checkpoint_timeout)
> - Write-heavy workloads
> - Replication scenarios where WAL volume directly impacts network
>
> In production environments where max_wal_size is often set
> conservatively (e.g., 1GB) and datasets are much larger than
> shared_buffers, DWB should provide significant and consistent benefits
> over FPW. As for the crash safety testing you mentioned, it is on our
> roadmap as we continue to refine the patch for our internal RDS
> deployment.
>

I suspect that some folks would argue that the problem is as much
users with poorly configured servers (primarily undersized
max_wal_size and too frequent checkpointing) as it is them needing an
entirely different page write implementation, but there are certainly
some workloads this helps even when those things are tuned
accordingly. Makes me wonder, its a bit of a crazy idea, but have you
thought about the possibility of making this user settable per
transaction... recreating magic similar to synchronous_commit?

Robert Treat
https://xzilla.net

Re: [PROPOSAL] Doublewrite Buffer as an alternative torn page protection to Full Page Write

Reply via email to