On Thu, Jun 9, 2022 at 8:00 PM Matthias van de Meent <boekewurm+postg...@gmail.com> wrote: > Why so? We already dole out per-page space in 4-byte increments > through pd_linp, and I see no reason why we can't reserve some line > pointers for per-page metadata if we decide that we need extra > per-page ~overhead~ metadata.
Hmm, that's an interesting approach. I was thinking that putting data after the PageHeaderData struct would be a non-starter because the code that looks up a line pointer by index is currently just multiply-and-add and complicating it seems bad for performance. However, if we treated the space there as overlapping the line pointer array and making some line pointers unusable rather than something inserted prior to the line pointer array, we could avoid that. I still think it would be kind of complicated, though, because we'd have to find every bit of code that loops over the line pointer array or accesses it by index and make sure that it doesn't try to access the low-numbered line pointers. > Isn't the goal of a checksum to find - and where possible, correct - > bit flips and other broken pages? I would suggest not to use > cryptographic hash functions for that, as those are rarely > error-correcting. I wasn't thinking of trying to do error correction, just error detection. See also my earlier reply to Peter Geoghegan. > Isn't that expected for most of those places? With the current > bufpage.h description of Page, it seems obvious that all bytes on a > page except those in the "hole" and those in the page header are under > full control of the AM. Of course AMs will pre-calculate limits and > offsets during compilation, that saves recalculation cycles and/or > cache lines with constants to keep in L1. Yep. > Can't we add some extra fork that stores this extra per-page > information, and contains this extra metadata in a double-buffered > format, so that both before the actual page is written the metadata > too is written to disk, while the old metadata is available too for > recovery purposes. This allows us to maintain the current format with > its low per-page overhead, and only have extra overhead (up to 2x > writes for each page, but the writes for these metadata pages need not > be BLCKSZ in size) for those that opt-in to the more computationally > expensive features of larger checksums, nonces, and/or other non-AM > per-page ~overhead~ metadata. It's not impossible, I'm sure, but it doesn't seem very appealing to me. Those extra reads and writes could be expensive, and there's no place to cleanly integrate them into the code structure. A function like PageIsVerified() -- which is where we currently validate checksums -- only gets the page. It can't go off and read some other page from disk to perform the checksum calculation. I'm not exactly sure what you have in mind when you say that the writes need not be BLCKSZ in size. Technically I guess that's true, but then the checksums have to be crash safe, or they're not much good. If they're not part of the page, how do they get updated in a way that makes them crash safe? I guess it could be done: every time we write a FPW, enlarge the page image by the number of bytes that are stored in this location. When replaying an FPW, update those bytes too. And every time we read or write a page, also read or write those bytes. In essence, we'd be deciding that pages are 8192+n bytes, but the last n bytes are stored in a different file - and, in memory, a different buffer pool. I think that would be hugely invasive and unpleasant to make work and I think the performance would be poor, too. > I'd prefer if we didn't change the way pages are presented to AMs. > Currently, it is clear what area is available to you if you write an > AM that uses the bufpage APIs. Changing the page format to have the > buffer manager also touch / reserve space in the special areas seems > like a break of abstraction: Quoting from bufpage.h: > > * AM-generic per-page information is kept in PageHeaderData. > * > * AM-specific per-page data (if any) is kept in the area marked "special > * space"; each AM has an "opaque" structure defined somewhere that is > * stored as the page trailer. an access method should always > * initialize its pages with PageInit and then set its own opaque > * fields. > > I'd rather we keep this contract: am-generic stuff belongs in > PageHeaderData, with the rest of the page fully available for the AM > to use (including the special area). I don't think that changing the contract has to mean that it becomes unclear what the contract is. And you can't improve any system without changing some stuff. But you certainly don't have to like my ideas or anything.... -- Robert Haas EDB: http://www.enterprisedb.com