On Tue, Apr 23, 2019 at 01:21:27AM +0200, Tomas Vondra wrote: > On Sat, Apr 20, 2019 at 04:21:52PM -0400, Robert Haas wrote: > > On Sat, Apr 20, 2019 at 12:42 AM Stephen Frost <sfr...@snowman.net> wrote: > > > > Oh. Well, I already explained my algorithm for doing that upthread, > > > > which I believe would be quite cheap. > > > > > > > > 1. When you generate the .modblock files, stick all the block > > > > references into a buffer. qsort(). Dedup. Write out in sorted > > > > order. > > > > > > Having all of the block references in a sorted order does seem like it > > > would help, but would also make those potentially quite a bit larger > > > than necessary (I had some thoughts about making them smaller elsewhere > > > in this discussion). That might be worth it though. I suppose it might > > > also be possible to line up the bitmaps suggested elsewhere to do > > > essentially a BitmapOr of them to identify the blocks changed (while > > > effectively de-duping at the same time). > > > > I don't see why this would make them bigger than necessary. If you > > sort by relfilenode/fork/blocknumber and dedup, then references to > > nearby blocks will be adjacent in the file. You can then decide what > > format will represent that most efficiently on output. Whether or not > > a bitmap is better idea than a list of block numbers or something else > > depends on what percentage of blocks are modified and how clustered > > they are. > > > > Not sure I understand correctly - do you suggest to deduplicate and sort > the data before writing them into the .modblock files? Because that the > the sorting would make this information mostly useless for the recovery > prefetching use case I mentioned elsewhere. For that to work we need > information about both the LSN and block, in the LSN order. > > So if we want to allow that use case to leverage this infrastructure, we > need to write the .modfiles kinda "raw" and do this processing in some > later step. > > Now, maybe the incremental backup use case is so much more important the > right thing to do is ignore this other use case, and I'm OK with that - > as long as it's a conscious choice.
I think the concern is that the more graunular the modblock files are (with less de-duping), the larger they will be. -- Bruce Momjian <br...@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +