On Tue, Apr 16, 2024 at 12:06 PM Stefan Fercot <stefan.fer...@protonmail.com> wrote: > Sure, I can see your point here and how people could be tempted to through > away that backup_manifest if they don't know how important it is to keep it. > Probably in this case we'd need the list to be inside the tar, just like > backup_label and tablespace_map then.
Yeah, I think anywhere inside the tar is better than anywhere outside the tar, by a mile. I'm happy to leave the specific question of where inside the tar as something TBD at time of implementation by fiat of the person doing the work. But that said ... > Do you mean 1 stub-list per pgdata + 1 per tablespaces? > > I don't really see how it would be faster to recursively go through each > sub-directories of the pgdata and tablespaces to gather all the pieces > together compared to reading 1 main file. > But I guess, choosing one option or the other, we will only find out how well > it works once people will use it on the field and possibly give some feedback. The reason why I was suggesting one stub-list per directory is that we recurse over the directory tree. We reach each directory in turn, process it, and then move on to the next one. What I imagine that we want to do is - first iterate over all of the files actually present in a directory. Then, iterate over the list of stubs for that directory and do whatever we would have done if there had been a stub file present for each of those. So, we don't really want a list of every stub in the whole backup, or even every stub in the whole tablespace. What we want is to be able to easily get a list of stubs for a single directory. Which is very easily done if each directory contains its own stub-list file. If we instead have a centralized stub-list for the whole tablespace, or the whole backup, it's still quite possible to make it work. We just read that centralized stub list and we build an in-memory data structure that is indexed by containing directory, like a hash table where the key is the directory name and the value is a list of filenames within that directory. But, a slight disadvantage of this model is that you have to keep that whole data structure in memory for the whole time you're reconstructing, and you have to pass around a pointer to it everywhere so that the code that handles individual directories can access it. I'm sure this isn't the end of the world. It's probably unlikely that someone has so many stub files that the memory used for such a data structure is painfully high, and even if they did, it's unlikely that they are spread out across multiple databases and/or tablespaces in such a way that only needing the data for one directory at a time would save you. But, it's not impossible that such a scenario could exist. Somebody might say - well, don't go directory by directory. Just handle all of the stubs at the end. But I don't think that really fixes anything. I want to be able to verify that none of the stubs listed in the stub-list are also present in the backup as real files, for sanity checking purposes. It's quite easy to see how to do that in the design I proposed above: keep a list of the files for each directory as you read it, and then when you read the stub-list for that directory, check those lists against each other for duplicates. Doing this on the level of a whole tablespace or the whole backup is clearly also possible, but once again it potentially uses more memory, and there's no functional gain. Plus, this kind of approach would also make the reconstruction process "jump around" more. It might pull a bunch of mostly-unchanged files from the full backup while handling the non-stub files, and then come back to that directory a second time, much later, when it's processing the stub-list. Perhaps that would lead to a less-optimal I/O pattern, or perhaps it would make it harder for the user to understand how much progress reconstruction had made. Or perhaps it would make no difference at all; I don't know. Maybe there's even some advantage in a two-pass approach like this. I don't see one. But it might prove otherwise on closer examination. -- Robert Haas EDB: http://www.enterprisedb.com