On Wed, Mar 04, 2020 at 11:32:00AM +1300, David Rowley wrote:
On Tue, 18 Feb 2020 at 05:24, Dmitry Dolgov <9erthali...@gmail.com> wrote:
Here is something similar to what I had in mind.

(changing to this email address for future emails)

Hi,

I've been looking over v32 of the patch and have a few comments
regarding the planner changes.

I think the changes in create_distinct_paths() need more work.  The
way I think this should work is that create_distinct_paths() gets to
know exactly nothing about what path types support the elimination of
duplicate values.  The Path should carry the UniqueKeys so that can be
determined. In create_distinct_paths() you should just be able to make
use of those paths, which should already have been created when
creating index paths for the rel due to PlannerInfo's query_uniquekeys
having been set.


+1 to code this in a generic way, using query_uniquekeys (if possible)

The reason it must be done this way is that when the RelOptInfo that
we're performing the DISTINCT on is a joinrel, then we're not going to
see any IndexPaths in the RelOptInfo's pathlist. We'll have some sort
of Join path instead.  I understand you're not yet supporting doing
this optimisation when joins are involved, but it should be coded in
such a way that it'll work when we do.  (It's probably also a separate
question as to whether we should have this only work when there are no
joins. I don't think I personally object to it for stage 1, but
perhaps someone else might think differently.)


I don't follow. Can you elaborate more?

AFAICS skip-scan is essentially a capability of an (index) AM. I don't
see how we could ever do that for joinrels? We can do that at the scan
level, below a join, but that's what this patch already supports, I
think. When you say "supporting this optimisation" with joins, do you
mean doing skip-scan for join inputs, or on top of the join?

For storing these new paths with UniqueKeys, I'm not sure exactly if
we can just add_path() such paths into the RelOptInfo's pathlist.
What we don't want to do is accidentally make use of paths which
eliminate duplicate values when we don't want that behaviour.  If we
did store these paths in RelOptInfo->pathlist then we'd need to go and
modify a bunch of places to ignore such paths. set_cheapest() would
have to do something special for them too, which makes me think
pathlist is the incorrect place.  Parallel query added
partial_pathlist, so perhaps we need unique_pathlist to make this
work.


Hmmm, good point. Do we actually produce incorrect plans with the
current patch, using skip-scan path when we should not?


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply via email to