Hi On 2019-04-08 14:53:53 +0300, Heikki Linnakangas wrote: > On 05/04/2019 23:25, Andres Freund wrote: > > - the (optional) bitmap heap scan API - that's fairly intrinsically > > block based. An AM could just internally subdivide TIDs in a different > > way, but I don't think a bitmap scan like we have would e.g. make a > > lot of sense for an index oriented table without any sort of stable > > tid. > > If an AM doesn't implement the bitmap heap scan API, what happens? Bitmap > scans are disabled?
Yea, the planner doesn't consider them. It just masks the index's amhasgetbitmap. Seems to be the most reasonable thing to do? > Even if an AM isn't block-oriented, the bitmap heap scan API still makes > sense as long as there's some correlation between TIDs and physical > location. Yea, it could be a non-linear mapping. But I'm honestly not sure how many non-block oriented AMs with such a correlation there are - I mean you're not going to have that in say an IOT. And it'd be trivial to just "fake" a block mapping for an in-memory AM. > The only really broken thing about that currently is the > prefetching: nodeBitmapHeapScan.c calls PrefetchBuffer() directly with the > TID's block numbers. It would be pretty straightforward to wrap that in a > callback, so that the AM could do something different. That, and the VM_ALL_VISIBLE() checks both in nodeBitmapHeapscan.c and nodeIndexonlyscan.c. > Or move even more of the logic to the AM, so that the AM would get the whole > TIDBitmap in table_beginscan_bm(). It could then implement the fetching and > prefetching as it sees fit. > > I don't think it's urgent, though. We can cross that bridge when we get > there, with the first AM that needs that flexibility. Yea, it seemed nontrivial (not in really hard, just not obvious), and the implicated code duplication scared me away. > > The most constraining factor for storage, I think, is that currently the > > API relies on ItemPointerData style TIDs in a number of places (i.e. a 6 > > byte tuple identifier). > > I think 48 bits would be just about enough I don't think that's really true. Consider e.g. implementing an index oriented table - there's no way you can efficiently implement one with that small a key. You basically need a helper index just to have efficient and small enough tids. And given that we're also going to need wider tids for global indexes, I suspect we're just going to have to bite into the sour apple and make tids variable width. > , but it's even more limited than > you might at the moment. There are a few places that assume that the > offsetnumber <= MaxHeapTuplesPerPage. See ginpostinglist.c, and > MAX_TUPLES_PER_PAGE in tidbitmap.c. Also, offsetnumber can't be 0, because > that makes the ItemPointer invalid, which is inconvenient if you tried to > use ItemPointer as just an arbitrary 48-bit integer. Good point. Thanks for looking (and playing, in the other thread)! Greetings, Andres Freund