(2019/03/02 1:10), Robert Haas wrote:
On Fri, Mar 1, 2019 at 5:47 AM Etsuro Fujita
<fujita.ets...@lab.ntt.co.jp> wrote:
Robert, I CCed you because you are the author of that commit. Before
that commit ("Rewrite the code that applies scan/join targets to
paths."), apply_scanjoin_target_to_paths() had a boolean parameter named
modify_in_place, and used apply_projection_to_path(), not
create_projection_path(), to adjust scan/join paths when modify_in_place
was true, which allowed us to save cycles at plan creation time by
avoiding creating projection paths, which I think would be a good thing,
but that commit removed that. Why?
One of the goals of the commit was to properly account for the cost of
computing the target list. Before parallel query and partition-wise
join, it didn't really matter what the cost of computing the target
list was, because every path was going to have to do the same work, so
it was just a constant factor getting added to every path. However,
parallel query and partition-wise join mean that some paths can
compute the final target list more cheaply than others, and that turns
out to be important for things like PostGIS. One of the complaints
that provoked that change was that PostGIS was picking non-parallel
plans even when a parallel plan was substantially superior, because it
wasn't accounting for the fact that in the parallel plan, the cost of
computing the target-list could be shared across all the workers
rather than paid entirely by the leader.
In order to accomplish this goal of properly accounting for the cost
of computing the target list, we need to create a new path, not just
jam the target list into an already-costed path. Note that we did
some performance optimization around the same time to minimize the
performance hit here (see d7c19e62a8e0a634eb6b29f8f1111d944e57081f,
and I think there may have been something else as well although I
can't find it right now).
apply_projection_to_path() not only jams the given tlist into the
existing path but updates its tlist eval costs appropriately except for
the cases of Gather and GatherMerge:
/*
* If the path happens to be a Gather or GatherMerge path, we'd like to
* arrange for the subpath to return the required target list so that
* workers can help project. But if there is something that is not
* parallel-safe in the target expressions, then we can't.
*/
if ((IsA(path, GatherPath) ||IsA(path, GatherMergePath)) &&
is_parallel_safe(root, (Node *) target->exprs))
{
/*
* We always use create_projection_path here, even if the
subpath is
* projection-capable, so as to avoid modifying the subpath in
place.
* It seems unlikely at present that there could be any other
* references to the subpath, but better safe than sorry.
*
--> * Note that we don't change the parallel path's cost estimates; it
--> * might be appropriate to do so, to reflect the fact that the
bulk of
--> * the target evaluation will happen in workers.
*/
if (IsA(path, GatherPath))
{
GatherPath *gpath = (GatherPath *) path;
gpath->subpath = (Path *)
create_projection_path(root,
gpath->subpath->parent,
gpath->subpath,
target);
}
else
{
GatherMergePath *gmpath = (GatherMergePath *) path;
gmpath->subpath = (Path *)
create_projection_path(root,
gmpath->subpath->parent,
gmpath->subpath,
target);
}
}
The real reason for this question is: I noticed that projection paths
put on foreign paths will make it hard for FDWs to detect whether there
is an already-well-enough-sorted remote path in the pathlist for the
final scan/join relation as an input relation to GetForeignUpperPaths()
for the UPPERREL_ORDERED step (the IsA(path, ForeignPath) test would not
work well enough to detect remote paths!), so I'm wondering whether we
could revive that parameter like the attached, to avoid the overhead at
plan creation time and to make the FDW work easy. Maybe I'm missing
something, though.
I think this would be a bad idea, for the reasons explained above. I
also think that it's probably the wrong direction on principle. I
think the way we account for target lists is still pretty crude and
needs to be thought of as more of a real planning step and less as
something that we can just ignore when it's inconvenient for some
reason.
I'm not sure I 100% agree with you, but I also think we need to give
more thought to the tlist-eval-cost adjustment.
I think the FDW just needs to look through the projection
path and see what's underneath, but also take the projection path's
target list into account when decide whether more can be pushed down.
OK, I'll go with that.
Thanks for the explanation!
Best regards,
Etsuro Fujita