On 2019-03-29 16:20:54 +0000, Simon Riggs wrote: > On Fri, 29 Mar 2019 at 16:12, Andres Freund <and...@anarazel.de> wrote: > > > > On 2019-03-29 15:58:14 +0000, Simon Riggs wrote: > > > On Fri, 29 Mar 2019 at 15:29, Andres Freund <and...@anarazel.de> wrote: > > > > That's far from a trivial feature imo. It seems quite possible that > > we'd > > > > end up with increased overhead, because the current logic can get away > > > > with only doing hint bit style writes - but would that be true if we > > > > started actually replacing the item pointers? Because I don't see any > > > > guarantee they couldn't cross a page boundary etc? So I think we'd need > > > > to do WAL logging during index searches, which seems prohibitively > > > > expensive. > > > > > > > > > > Don't see that. > > > > > > I was talking about reusing the first 4 bytes of an index tuple's > > > ItemPointerData, > > > which is the first field of an index tuple. Index tuples are MAXALIGNed, > > so > > > I can't see how that would ever cross a page boundary. > > > > They're 8 bytes, and MAXALIGN often is 4 bytes: > > > > xids are 4 bytes, so we're good.
I literally went on to explain why that's not sufficient? You can't *just* replace the block number with an xid. You *also* need to set a flag denoting that it's an xid and dead now. Which can't fit in the same 4 bytes. You either have to set it in the IndexTuple's t_info, or or in the ItemIdData's lp_flags. And both can be on a different sectors. If the flag persists, and the xid doesn't you're going to interpret a block number as an xid - not great; but even worse, if the xid survives, but the flag doesn't, you're going to access the xid as a block - definitely not ok, because you're going to return wrong results. Greetings, Andres Freund