On Fri, Jun 21, 2019 at 1:20 AM Jesper Pedersen <jesper.peder...@redhat.com> wrote: > Attached is v20, since the last patch should have been v19.
I took this for a quick spin today. The DISTINCT ON support is nice and I think it will be very useful. I've signed up to review it and will have more to say later. But today I had a couple of thoughts after looking into how src/backend/optimizer/plan/planagg.c works and wondering how to do some more skipping tricks with the existing machinery. 1. SELECT COUNT(DISTINCT i) FROM t could benefit from this. (Or AVG(DISTINCT ...) or any other aggregate). Right now you get a seq scan, with the sort/unique logic inside the Aggregate node. If you write SELECT COUNT(*) FROM (SELECT DISTINCT i FROM t) ss then you get a skip scan that is much faster in good cases. I suppose you could have a process_distinct_aggregates() in planagg.c that recognises queries of the right form and generates extra paths a bit like build_minmax_path() does. I think it's probably better to consider that in the grouping planner proper instead. I'm not sure. 2. SELECT i, MIN(j) FROM t GROUP BY i could benefit from this if you're allowed to go forwards. Same for SELECT i, MAX(j) FROM t GROUP BY i if you're allowed to go backwards. Those queries are equivalent to SELECT DISTINCT ON (i) i, j FROM t ORDER BY i [DESC], j [DESC] (though as Floris noted, the backwards version gives the wrong answers with v20). That does seem like a much more specific thing applicable only to MIN and MAX, and I think preprocess_minmax_aggregates() could be taught to handle that sort of query, building an index only scan path with skip scan in build_minmax_path(). -- Thomas Munro https://enterprisedb.com