> On Fri, 16 Nov 2018 at 16:06, Jesper Pedersen <jesper.peder...@redhat.com> > wrote: > > On 11/15/18 6:41 AM, Alexander Kuzmenkov wrote: > >>> But having this logic inside _bt_next means that it will make a > >>> non-skip index > >>> only scan a bit slower, am I right? > >> > >> Correct. > > > > Well, it depends on how the skip scan is implemented. We don't have to > > make normal scans slower, because skip scan is just a separate thing. > > > > My main point was that current implementation is good as a proof of > > concept, but it is inefficient for some data and needs some unreliable > > planner logic to work around this inefficiency. And now we also have > > execution-time fallback because planning-time fallback isn't good > > enough. This looks overly complicated. Let's try to design an algorithm > > that works well regardless of the particular data and doesn't need these > > heuristics. It should be possible to do so for btree. > > > > Of course, I understand the reluctance to implement an entire new type > > of btree traversal. Anastasia Lubennikova suggested a tweak for the > > current method that may improve the performance for small groups of > > equal values. When we search for the next unique key, first check if it > > is contained on the current btree page using its 'high key'. If it is, > > find it on the current page. If not, search from the root of the tree > > like we do now. This should improve the performance for small equal > > groups, because there are going to be several such groups on the page. > > And this is exactly where we have the regression compared to unique + > > index scan. > > Robert suggested something similar in [1]. I'll try and look at that > when I'm back from my holiday.
Yeah, probably you're right. Unfortunately, I've misunderstood the previous Robert's message in this thread with the similar approach. Jesper, I hope you don't mind if I'll post an updated patch? _bt_skip is changed there in the suggested way, so that it checks the current page before searching from the root of a tree, and I've removed the fallback logic. After some initial tests I see that with this version skip scan over a table with 10^7 rows and 10^6 distinct values is slightly slower than a regular scan, but not that much. > > By the way, what is the data for which we intend this feature to work? > > Obviously a non-unique btree index, but how wide are the tuples, and how > > big the equal groups? It would be good to have some real-world examples. > > Although my primary use-case is int I agree that we should test the > different data types, and tuple widths. My personal motivation here is exactly that we face use-cases for skip scan from time to time. Usually it's quite few distinct values (up to a dozen or so, which means that equal groups are quite big), but with the variety of types and widths. > On Thu, 15 Nov 2018 at 15:28, James Coleman <jtc...@gmail.com> wrote: > > > Is skip scan only possible for index-only scan? > > I haven't see much discussion of this question yet. Is there a > particular reason to lock ourselves into thinking about this only in > an index only scan? I guess, the only reason is to limit the scope of the patch.
0001-Index-skip-scan-v4.patch
Description: Binary data