On Tue, Nov 28, 2017 at 6:41 AM, Alexander Korotkov <a.korot...@postgrespro.ru> wrote: > On Mon, Nov 27, 2017 at 10:56 PM, Robert Haas <robertmh...@gmail.com> wrote: >> >> On Fri, Nov 24, 2017 at 5:33 AM, Alexander Korotkov >> <a.korot...@postgrespro.ru> wrote: >> > pg_prune_xid makes sense only for heap pages. Once we introduce special >> > area for heap pages, we can move pg_prune_xid there and save some bytes >> > in >> > index pages. However, this is an optimization not directly related to >> > 64-bit xids. Idea is that if we anyway change page format, why don't >> > apply >> > this optimization as well? But if we have any doubts in this, it can be >> > removed with no problem. >> >> My first reaction is that changing the page format seems like a >> non-starter, because it would break pg_upgrade. If we get the heap >> storage API working, then we could have a heap AM that works as it >> does today and a newheap AM with such changes, but I have a bit of a >> hard time imagining a patch that causes a hard compatibility break >> ever being accepted.
Yeah.. I can't imagine that either. > Thank you for raising this question. There was a discussion about 64-bit > xids during PGCon 2017. Couple ways to provide pg_upgrade were discussed. > > 1) We've page layout version in the page (current is number 4). So, we can > define new page layout version 5. Pages with new layout version would > contain 64-bit base values for xid and multixact. The question is how to > deal with page of layout version 4. If this page have enough of free space > to fit extra 16 bytes, then it could be upgraded on the fly. If it doesn't > contains enough of space for than then things becomes more complicated: we > can't upgrade it to new format, but we still need to fit new xmax value > there in the case tuple being updated or deleted. pg_upgrade requires > server restart. Thus, once we set hint bits, pre-pg_upgrade xmin is not > really meaningful – corresponding xid is visible for every post-pg_upgrade > snapshot. So, idea is to use both xmin and xmax tuple fields on such > unupgradable page to store 64-bit xmax. This idea was proposed by me, but > was criticized by some session attendees (sorry, but I don't recall who were > them) for its complexity and suspected overhead. > > 2) Alternative idea was to use unused bits in page header. Naturally, if we > would look for unused bits in pd_flags (3 bits of 16 is used), > pd_pagesize_version (we can left 1 bit of 16 to distinguish between old and > new format) and pd_special (we can leave 1 bit to distinguish sequence > pages), we can scrape together 43 bits. That would be far enough for single > base value, because we definitely don't need all lower 32-bits of base value > (21 bits is more than enough). But I'm not sure about two base values: if > we would live 2 bits for lower part of base value, than it leaves us 19 bits > for high part of base value. This solution would give us 2^51 maximum > values for xids and multixacts. I'm not sure if it's enough to assume these > counters infinite. AFAIK, there are products on the market whose have > 48-bit transaction identifiers and don't care about wraparound or > something... > > New heap AM for 64-bit xids is an interesting idea too. I would even say > that pluggable storage API being discussed now is excessive for this > particular purpose (but still can fit!), because in most of aspects heap > with 64-bit xids is absolutely same as current heap (in contrast to heap > with undo log, for example). Best fit API for heap with 64-bit xid support > would be pluggable heap page format. But I don't think it deserves separate > API though. I am moving that entry to next CF as discussion still goes on. -- Michael