Hi, When running write heavy transactional workloads I've many times observed that one needs to run the benchmarks for quite a while till they get to their steady state performance. The most significant reason for that is that initially WAL files will not get recycled, but need to be freshly initialized. That's 16MB of writes that need to synchronously finish before a small write transaction can even start to be written out...
I think there's two useful things we could do: 1) Add pg_wal_preallocate(uint64 bytes) that ensures (bytes + segment_size - 1) / segment_size WAL segments exist from the current point in the WAL. Perhaps with the number of bytes defaulting to min_wal_size if not explicitly specified? 2) Have checkpointer (we want walwriter to run with low latency to flush out async commits etc) occasionally check if WAL files need to be pre-allocated. Checkpointer already tracks the amount of WAL that's expected to be generated till the end of the checkpoint, so it seems like it's a pretty good candidate to do so. To keep checkpointer pre-allocating when idle we could signal it whenever a record has crossed a segment boundary. With a plain pgbench run I see a 2.5x reduction in throughput in the periods where we initialize WAL files. Greetings, Andres Freund