On Mon, Sep 18, 2017 at 5:43 AM, Dmitriy Sarafannikov <dsarafanni...@yandex.ru> wrote: > Hi hackers, > > Everybody knows, that we have unefficient execution of query like "SELECT > DISTINCT id from mytable" > if table has many-many rows and only several unique id values. Query plan > looks like Unique + IndexScan. > > I have tried to implement this feature in new type of node called Loose > Scan. > This node must appears in plan together with IndexScan or IndexOnlyScan just > like Unique node in this case. > But instead of filtering rows with equal values, LooseScan must retreive > first row from IndexScan, > then remember and return this. With all subsequent calls LooseScan must > initiate calling index_rescan via ExecReScan > with search value that > or < (depending on scan direction) of previous > value. > Total cost of this path must be something like total_cost = > n_distinct_values * subpath->startup_cost > What do you think about this idea? > > I was able to create new LooseScan node, but i ran into difficulties with > changing scan keys. > I looked (for example) on the ExecReScanIndexOnlyScan function and I find it > difficult to understand > how i can reach the ioss_ScanKeys through the state of executor. Can you > help me with this? > > I also looked on the Nested Loop node, which as i think must change scan > keys, but i didn't become clear for me. > The only thought that came to my head, that maybe i incorrectly create this > subplan. > I create it just like create_upper_unique_plan, and create subplan for > IndexScan via create_plan_recurse. > Can you tell me where to look or maybe somewhere there are examples?
Not an answer to your question, but generally +1 for working on this area. I did some early proto-hacking a while ago, which I haven't had time to get back to yet: https://www.postgresql.org/message-id/flat/CADLWmXWALK8NPZqdnRQiPnrzAnic7NxYKynrkzO_vxYr8enWww%40mail.gmail.com That was based on the idea that a DISTINCT scan using a btree index to skip ahead is always going to be using the leading N columns and always going to be covered by the index, so I might as well teach Index Only Scan how to do it directly rather than making a new node to sit on top. As you can see in that thread I did start thinking about using a new node to sit on top and behave a bit like a nested loop for the more advanced feature of an Index Skip Scan (trying every value of (a) where you had an index on (a, b, c) but had a WHERE clause qual on (b, c)), but the only feedback I had so far was from Robert Haas who thought that too should probably be pushed into the index scan. FWIW I'd vote for 'skip' rather than 'loose' as a term to use in this family of related features (DISTINCT being the easiest to build). It seems more descriptive than the MySQL term. (DB2 added this a little while ago and went with 'index jump scan', suggesting that we should consider 'hop'... (weak humour, 'a hop, skip and a jump' being an English idiom meaning a short distance)). -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers