> For the general forward direction but for a backwards cursor scroll,

> we'd return the lowest value for each distinct prefix, but for the
> general backwards direction (DESC case) we'd return the highest value
> for each distinct prefix. Looking at IndexNext() the cursor direction
> seems to be estate->es_direction and the general scan direction is
> indicated by the plan's indexorderdir. Can't we just pass both of
> those to index_skip() to have it decide what to do? If we also pass in
> indexorderdir then index_skip() should know if it's to return the
> highest or lowest value, right?

Correct, with these two values correct behavior can be deduced. The 
implementation of this is a bit cumbersome though. Consider a case like:

SELECT DISTINCT ON (a) a,b,c FROM a WHERE c = 2 (with an index on a,b,c)
Data (imagine every tuple here actually occurs 10.000 times in the index to see 
the benefit of skipping):
1,1,1
1,1,2
1,2,2
1,2,3
2,2,1
2,2,3
3,1,1
3,1,2
3,2,2
3,2,3

Creating a cursor on this query and then moving forward, you should get 
(1,1,2), (3,1,2). In the current implementation of the patch, after bt_first, 
it skips over (1,1,2) to (2,2,1). It checks quals and moves forward one-by-one 
until it finds a match. This match only comes at (3,1,2) however. Then it skips 
to the end.

If you move the cursor backwards from the end of the cursor, you should still 
get (3,1,2) (1,1,2). A possible implementation would start at the end and do a 
skip to the beginning of the prefix: (3,1,1). Then it needs to move forward 
one-by-one in order to find the first matching (minimum) item (3,1,2). When it 
finds it, it needs to skip backwards to the beginning of prefix 2 (2,2,1). It 
needs to move forwards to find the minimum element, but should stop as soon as 
it detects that the prefix doesn't match anymore (because there is no match for 
prefix 2, it will move all the way from (2,2,1) to (3,1,1)). It then needs to 
skip backwards again to the start of prefix 1: (1,1,1) and scan forward to find 
(1,1,2).
Perhaps anyone can think of an easier way to implement it?

I do think being able to use DISTINCT ON is very useful and it's worth the 
extra complications. In the future we can add even more useful skipping 
features to it, for example:
SELECT DISTINCT ON (a) * FROM a WHERE b =2
After skipping to the next prefix of column a, we can start a new search for 
(a,b)=(prefix,2) to avoid having to move one-by-one from the start of the 
prefix to the first matching element. There are many other useful optimizations 
possible. That won't have to be for this patch though :-)

-Floris

Reply via email to