On Thu, Aug 20, 2020 at 5:49 AM osumi.takami...@fujitsu.com <osumi.takami...@fujitsu.com> wrote: > > Hello. > > > During the crash recovery, those points are helpful to recognize and > > > detach such blocks in order to solve a situation that the loaded data is > > > partially > > synced to the disk and the rest isn't. > > > > How do online backup and archive recovery work ? > > > > Suppose that the user executes pg_basebackup during COPY UNLOGGED running, > > the physical backup might have the portion of tuples loaded by COPY UNLOGGED > > but these data are not recovered. It might not be a problem because the > > operation > > is performed without WAL records. But what if an insertion happens after > > COPY > > UNLOGGED but before pg_stop_backup()? I think that a new tuple could be > > inserted at the end of the table, following the data loaded by COPY > > UNLOGGED. > > With your approach described above, the newly inserted tuple will be > > recovered > > during archive recovery, but it either will be removed if we replay the > > insertion > > WAL then truncate the table or won’t be inserted due to missing block if we > > truncate the table then replay the insertion WAL, resulting in losing the > > tuple > > although the user got successful of insertion. > I consider that from the point in time when COPY UNLOGGED is executed, > any subsequent operations to the data which comes from UNLOGGED operation > also cannot be recovered even if those issued WAL. > > This is basically inevitable because subsequent operations > after COPY UNLOGGED depend on blocks of loaded data without WAL, > which means we cannot replay exact operations. > > Therefore, all I can do is to guarantee that > when one recovery process ends, the target table returns to the state > immediately before the COPY UNLOGGED is executed. > This could be achieved by issuing and notifying the server of an invalidation > WAL, > an indicator to stop WAL application toward one specific table after this new > type of WAL. >
I don't think we can achieve what you want by one special invalidation WAL. Consider a case where an update has happened on the page which exists before 'Copy Unlogged' operation and while writing that page to disk, the system crashed and the page is half-written. Without the special WAL mechanism you are proposing to introduce, during recovery, we can replay the full-page-image from WAL of such a page and then perform the required update, so after recovery, the page won't be torn anymore. Basically, the idea is that to protect from such torn-writes (half-written pages), we have a concept called full-page writes which protects the data from such writes after recovery. Before writing to any page after a checkpoint, we write its full-page-image in WAL which helps us in recovering from such situations but with your proposed mechanism it won't work. Another concern I have with this idea is that you want to keep writing WAL for such a relation but don't want to replay in recovery which sounds like a good idea. The idea to keep part of the table as logged and other as unlogged sounds scary to me. Now, IIUC, you are trying to come up with these ideas because to use Alter Table .. Set Unlogged, one has to rewrite the entire table and if such a table is large, it will be a very time-consuming operation. You might want to explore whether we can avoid rewriting the table for such an operation but I don't think that is easy either. The two problems I could see immediately are (a) we have to change BM_PERMANENT marking of exiting buffers of such a relation which again can be a time-consuming operation especially for a large value of shread_buffers, (b) create an _init fork of such a relation in-sync with the commit. You might want to read the archives to see why at the first place we have decided to re-write the table for SET UNLOGGED operation, see email [1]. [1] - https://www.postgresql.org/message-id/CAFcNs%2Bpeg3VPG2%3Dv6Lu3vfCDP8mt7cs6-RMMXxjxWNLREgSRVQ%40mail.gmail.com -- With Regards, Amit Kapila.