Hi Bharath, Idea to atomically allocate WAL file by creating tmp file and renaming it is nice. I have one question though: How is partially temp file created will be cleaned if the VM crashes or out of disk space cases? Does it endup creating multiple files for every VM crash/disk space during process of pg_receivewal?
Thoughts? Thanks, Mahendrakar. On Mon, 25 Jul 2022 at 16:42, Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > On Mon, Apr 25, 2022 at 5:17 PM Bharath Rupireddy > <bharath.rupireddyforpostg...@gmail.com> wrote: > > > > On Mon, Apr 25, 2022 at 6:38 AM Michael Paquier <mich...@paquier.xyz> > wrote: > > > > > > On Fri, Apr 22, 2022 at 07:17:37PM +0530, Bharath Rupireddy wrote: > > > > Right. We find enough disk space and go to write and suddenly the > > > > write operations fail for some reason or the VM crashes because of a > > > > reason other than disk space. I think the foolproof solution is to > > > > figure out the available disk space before prepadding or compressing > > > > and also use the > > > > write-first-to-temp-file-and-then-rename-it-to-original-file as > > > > proposed in the earlier patches in this thread. > > > > > > Yes, what would count here is only the amount of free space in a > > > partition. The total amount of space available becomes handy once you > > > begin introducing things like percentage-based quota policies for the > > > disk when archiving. The free amount of space could be used to define > > > a policy based on the maximum number of bytes you need to leave > > > around, as well, but this is not perfect science as this depends of > > > what FSes decide to do underneath. There are a couple of designs > > > possible here. When I had to deal with my upthread case I have chosen > > > one as I had no need to worry only about Linux, it does not mean that > > > this is the best choice that would fit with the long-term community > > > picture. This comes down to how much pg_receivewal should handle > > > automatically, and how it should handle it. > > > > Thanks. I'm not sure why we are just thinking of crashes due to > > out-of-disk space. Figuring out free disk space before writing a huge > > file (say a WAL file) is a problem in itself to the core postgres as > > well, not just pg_receivewal. > > > > I think we are off-track a bit here. Let me illustrate what's the > > whole problem is and the idea: > > > > If the node/VM on which pg_receivewal runs, goes down/crashes or fails > > during write operation while padding the target WAL file (the .partial > > file) with zeros, the unfilled target WAL file ((let me call this file > > a partially padded .partial file) will be left over and subsequent > > reads/writes to that it will fail with "write-ahead log file \"%s\" > > has %zd bytes, should be 0 or %d" error which requires manual > > intervention to remove it. In a service, this manual intervention is > > what we would like to avoid. Let's not much bother right now for > > compressed file writes (for now at least) as they don't have a > > prepadding phase. > > > > The proposed solution is to make the prepadding atomic - prepad the > > XXXX.partial file as XXXX.partial.tmp name and after the prepadding > > rename (durably if sync option is chosen for pg_receivewal) to > > XXXX.partial. Before prepadding XXXX.partial.tmp, delete the > > XXXX.partial.tmp if it exists. > > > > The above problem isn't unique to pg_receivewal alone, pg_basebackup > > too uses CreateWalDirectoryMethod and dir_open_for_write via > > ReceiveXlogStream. > > > > IMHO, pg_receivewal checking for available disk space before writing > > any file should better be discussed separately? > > Here's the v3 patch after rebasing. > > I just would like to reiterate the issue the patch is trying to solve: > At times (when no space is left on the device or when the process is > taken down forcibly (VM/container crash)), there can be leftover > uninitialized .partial files (note that padding i.e. filling 16MB WAL > files with all zeros is done in non-compression cases) due to which > pg_receivewal fails to come up after the crash. To address this, the > proposed patch makes the padding 16MB WAL files atomic (first write a > .temp file, pad it and durably rename it to the original .partial > file, ready to be filled with received WAL records). This approach is > similar to what core postgres achieves atomicity while creating new > WAL file (see how XLogFileInitInternal() creates xlogtemp.%d file > first and then how InstallXLogFileSegment() durably renames it to > original WAL file). > > Thoughts? > > Regards, > Bharath Rupireddy. >