On Wed, Apr 2, 2025 at 11:36 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > Ouch! I had no idea it had gotten that big. Yeah, we ought to > do something about that.
Tomas Vondra talked about this recently, in the context of his work on prefetching. > > And/or perhaps we could could allocate BTScanOpaqueData.markPos as a whole > > only when mark/restore are used? > > That'd be an easy way of removing about half of the problem, but > 14kB is still too much. How badly do we need this items array? > Couldn't we just reference the on-page items? I'm not sure what you mean by that. The whole design of _bt_readpage is based on the idea that we read a whole page, in one go. It has to batch up the items that are to be returned from the page somewhere. The worst case is that there are about 1350 TIDs to return from any single page (assuming default BLCKSZ). It's very pessimistic to start from the assumption that that worst case will be hit, but I don't see a way around doing it at least some of the time. The first thing I'd try is some kind of simple dynamic allocation scheme, with a small built-in array that avoided any allocation penalty in the common case where there weren't too many tuples to return from the page. The way that we allocate BLCKSZ twice for index-only scans (one for so->currTuples, the other for so->markTuples) is also pretty inefficient. Especially because any kind of use of mark and restore is exceedingly rare. -- Peter Geoghegan