> On Thu, 13 Sep 2018 at 21:36, Alexander Kuzmenkov > <a.kuzmen...@postgrespro.ru> wrote: > > El 13/09/18 a las 18:39, Jesper Pedersen escribió: > >> I think we can improve this, >> and the skip scan can be strictly faster than index scan regardless of >> the data. As a first approximation, imagine that we somehow skipped >> equal tuples inside _bt_next instead of sending them to the parent >> Unique node. This would already be marginally faster than Unique + Index >> scan. A more practical implementation would be to remember our position >> in tree (that is, BTStack returned by _bt_search) and use it to skip >> pages in bulk. This looks straightforward to implement for a tree that >> does not change, but I'm not sure how to make it work with concurrent >> modifications. Still, this looks a worthwhile direction to me, because >> if we have a strictly faster skip scan, we can just use it always and >> not worry about our unreliable statistics. What do you think? >> > > This is something to look at -- maybe there is a way to use > btpo_next/btpo_prev instead/too in order to speed things up. Atm we just > have the scan key in BTScanOpaqueData. I'll take a look after my > upcoming vacation; feel free to contribute those changes in the meantime > of course.
But having this logic inside _bt_next means that it will make a non-skip index only scan a bit slower, am I right? Probably it would be easier and more straightforward to go with the idea of dynamic fallback then. The first naive implementation that I came up with is to wrap an index scan node into a unique, and remember estimated number of groups into IndexOnlyScanState, so that we can check if we performed much more skips than expected. With this approach index skip scan will work a bit slower than in the original patch in case if ndistinct is correct (because a unique node will recheck rows we returned), and fallback to unique + index only scan in case if planner has underestimated ndistinct.
index-skip-fallback.patch
Description: Binary data