On Thu, Aug 26, 2021 at 01:24:46PM -0400, Stephen Frost wrote: > Greetings, > > * Bruce Momjian (br...@momjian.us) wrote: > > On Thu, Aug 26, 2021 at 01:03:54PM -0400, Stephen Frost wrote: > > > Yes, we're talking about either incremental (or perhaps differential) > > > backup where only the files which are actually different would be backed > > > up. Just like with PG, I can't provide any complete guarantees that > > > we'd be able to actually make this possible after a major version with > > > pgBackRest with this change, but it definitely isn't possible *without* > > > this change. I can't see any reason why we wouldn't be able to do a > > > checksum-based incremental backup though (which would be *much* faster > > > than a regular backup) once this change is made and have that be a > > > reliable and trustworthy backup. I'd want to think about it more and > > > discuss it with David in some detail before saying if we could maybe > > > perform a timestamp-based incremental backup (without checksum'ing the > > > files, as we do in normal situations), but that would really just be a > > > bonus. > > > > Well, it would be nice to know exactly how it would help pgBackRest if > > that is one of the reasons we are adding this feature. > > pgBackRest keeps a manifest for every file in the PG data directory that > is backed up and we identify that file by the filename. Further, we > calculate a checksum for every file. If the filenames didn't change > then we'd be able to compare the file in the new cluster against the > file and checksum in the manifest in order to be able to perform the > incremental/differential backup. We don't store the inodes in the > manifest though, and we don't have any concept of looking at multiple > data directories at the same time or anything like that (which would > also mean that the old data directory would have to be kept around for > that to even work, which seems like a good bit of additional > complication and risk that someone might start up the old cluster by > accident..). > > That's how it'd be very helpful to pgBackRest for the filenames to be > preserved across pg_upgrade's.
OK, that is clear. > > > > > > As far as TDE, I haven't seen any concrete plan for that, so why add > > > > > > this code for that reason? > > > > > > > > > > That this would help with TDE (of which there seems little doubt...) > > > > > is > > > > > an additional benefit to this. Specifically, taking the existing work > > > > > that's already been done to allow block-by-block encryption and > > > > > adjusting it for AES-XTS and then using the db-dir+relfileno+block > > > > > number as the IV, just like many disk encryption systems do, avoids > > > > > the > > > > > concerns that were brought up about using LSN for the IV with CTR and > > > > > it's certainly not difficult to do, but it does depend on this change. > > > > > This was all discussed previously and it sure looks like a sensible > > > > > approach to use that mirrors what many other systems already do > > > > > successfully. > > > > > > > > Well, I would think we would not add this for TDE until we were sure > > > > someone was working on adding TDE. > > > > > > That this would help with TDE is what I'd consider an added bonus. > > > > Not if we have no plans to implement TDE, which was my point. Why not > > wait to see if we are actually going to implement TDE rather than adding > > it now. It is just so obvious, why do I have to state this? > > There's been multiple years of effort put into implementing TDE and I'm > sure hopeful that it continues as I'm trying to put effort into moving > it forward myself. I'm a bit baffled by the idea that we're just Well, this is the first time I am hearing this publicly. > suddenly going to stop putting effort into TDE as it is brought up time > and time again by clients that I've talked to as one of the few reasons > they haven't moved to PG yet- I can't believe that hasn't been > experienced by folks at other organizations too, I mean, there's people > maintaining forks of PG specifically for TDE ... Agreed. > > > I've certainly done it and I'd be kind of surprised if others haven't, > > > but I've also played a lot with pg_dump in various modes, so perhaps > > > that's not a great representation. I've definitely had to explain to > > > clients why there's a whole different set of filenames after a > > > pg_upgrade and why that is the case for an 'in place' upgrade before > > > too. > > > > Uh, so I guess I am right that few people have mentioned this in the > > past. Why were users caring about the file names? > > This is a bit baffling to me. Users and admins certainly care about > what files their data is stored in and knowing how to find them. > Covering the data directory structure is a commonly asked for part of > the training that I regularly do for clients. I just never thought people cared about the file names, since I have never heard a complaint about how pg_upgrade works all these years. > > > I have a very hard time seeing what changes might happen in the server > > > in this space that wouldn't have an impact on pg_upgrade, with or > > > without this. > > > > I don't know, but I have to ask since I can't know the future, so any > > "preseration" has to be studied. > > We can gain, perhaps, some insight looking into the past and that seems > to indicate that this is certainly a very stable part of the server code > in the first place, which would imply that it's unlikely that there'll > be much need to adjust this code in the future in the first place. Good, it have to ask. > > > > I am not saying this change is wrong, but I think the reasons need to be > > > > stated in this thread, rather than just moving forward. > > > > > > Ok, they've been stated and it seems to at least Robert and myself that > > > this is worthwhile to at least continue through to a concluded patch, > > > after which we can contemplate that patch's complexity against these > > > reasons. > > > > OK, that works for me. What bothers me is that the Desirability of this > > changes has not be clearly stated in this thread. > > I hope that this email and the many many prior ones have gotten across > the desirability of the change. Yes, I think we are in a better position now to evaluate this. -- Bruce Momjian <br...@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.