Re: [HACKERS] path toward faster partition pruning

Robert Haas Fri, 02 Mar 2018 08:25:22 -0800

On Fri, Mar 2, 2018 at 10:54 AM, David Rowley
<david.row...@2ndquadrant.com> wrote:
> I don't think building it is going to cost a huge amount. Presumably,
> there are not many partitioned tables with 10 rows, so probably having
> the get_partitions_from_clauses work as quickly as possible is better
> than saving 100 nanoseconds in executor startup.


I agree that one could go overboard with trying to push work from
executor time to planner time, but I don't think the current patch is
very close to the point of diminishing returns.  It's doing nearly
everything at execution time.

> That being said, there's still a small issue with the run-time pruning
> patch which is caused by me not pre-processing the clauses during
> planning. Ideally, I'd be able to pre-process at least enough to
> determine if any Params match the partition key so that I know if
> run-time pruning can be used or not. As of now, I'm not doing that as
> it seems wasteful to pre-process during planning just to get the Param
> Ids out, then not be able to carry the pre-processed ones over to the
> executor.  We also can't really reuse the pre-processed state that was
> generated during the planner's calls to generate_partition_clauses()
> since we'll additionally also be passing in the parameterized path
> clauses as well as the baserestrictinfo clauses.

I think it should be possible to have a structure where all the work
of classifying clauses happens in the planner.  By the time we get to
execution time, we should be able to know for sure which clauses are
relevant.  For example, if the user says WHERE a = $1 + 3 AND b =
(random() * 100)::int, and the partition key is (a, b), we should be
able to figure out at plan time that the clause containing b is
useless (because it's volatile) and the clause containing a is useful
only if this is range-partitioning (because with hash-partitioning we
must have an equality clause for every partition to do anything).  I
think it should also be possible to know which expressions need to be
computed at runtime -- in this case, $1 + 3 -- and to which columns of
the partition key they correspond -- in this case, the first.  I just
proposed a data representation which could track all that stuff and
I'm sure there are other ways to do it, too.

I think that things like PartClause that include both an opno and
various bits of cached information, including FmgrInfo, are not a very
good idea.  A lot of work has been done to maintain the separation of
immutable information -- like Plans or Exprs -- from the run-time
state they use -- PlanState or ExprState.  I think we would do well to
follow that distinction here, too, even if it seems to introduce some
"silly" overhead at execution time.  I think it will pay for itself in
future code maintenance and the ability to apply optimizations such as
JIT which benefit from good divisions in this case.  It is not crazy
to imagine that the "pruning program" idea I floated in a previous
email could be folded into the JIT stuff Andres is doing where
something with a less-clean separation of concerns would run into
problems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] path toward faster partition pruning

Reply via email to