Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Pavel Stehule Tue, 18 Aug 2020 21:19:01 -0700

st 19. 8. 2020 v 5:48 odesílatel David Rowley <dgrowle...@gmail.com> napsal:


> On Tue, 18 Aug 2020 at 21:42, David Rowley <dgrowle...@gmail.com> wrote:
> >
> > On Tue, 11 Aug 2020 at 17:44, Andres Freund <and...@anarazel.de> wrote:
> > >
> > > Hi,
> > >
> > > On 2020-08-11 17:23:42 +1200, David Rowley wrote:
> > > > On Tue, 11 Aug 2020 at 12:21, Andres Freund <and...@anarazel.de>
> wrote:
> > > > >
> > > > > On 2020-07-09 10:25:14 +1200, David Rowley wrote:
> > > > > > On Thu, 9 Jul 2020 at 04:53, Andres Freund <and...@anarazel.de>
> wrote:
> > > > > > > I'm not convinced it's a good idea to introduce a separate
> executor node
> > > > > > > for this. There's a fair bit of overhead in them, and they
> will only be
> > > > > > > below certain types of nodes afaict. It seems like it'd be
> better to
> > > > > > > pull the required calls into the nodes that do parametrized
> scans of
> > > > > > > subsidiary nodes. Have you considered that?
> > > > > >
> > > > > > I see 41 different node types mentioned in ExecReScan().  I don't
> > > > > > really think it would be reasonable to change all those.
> > > > >
> > > > > But that's because we dispatch ExecReScan mechanically down to
> every
> > > > > single executor node. That doesn't determine how many nodes would
> need
> > > > > to modify to include explicit caching? What am I missing?
> > > > >
> > > > > Wouldn't we need roughly just nodeNestloop.c and nodeSubplan.c
> > > > > integration?
> > > >
> > > > hmm, I think you're right there about those two node types.  I'm just
> > > > not sure you're right about overloading these node types to act as a
> > > > cache.
> > >
> > > I'm not 100% either, to be clear.  I am just acutely aware that adding
> > > entire nodes is pretty expensive, and that there's, afaict, no need to
> > > have arbitrary (i.e. pointer to function) type callbacks to point to
> the
> > > cache.
> >
> > Perhaps you're right, but I'm just not convinced of it.  I feel
> > there's a certain air of magic involved in any node that has a good
> > name and reputation for doing one thing that we suddenly add new
> > functionality to which causes it to perform massively differently.
> >
>
> [ my long babble removed]
>
> > I'm wondering if anyone else has any thoughts on this?
>
> Just for anyone following along at home. The two variations would
> roughly look like:
>
> Current method:
>
> regression=# explain (analyze, costs off, timing off, summary off)
> select count(*) from tenk1 t1 inner join tenk1 t2 on
> t1.twenty=t2.unique1;
>                                       QUERY PLAN
>
> ---------------------------------------------------------------------------------------
>  Aggregate (actual rows=1 loops=1)
>    ->  Nested Loop (actual rows=10000 loops=1)
>          ->  Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
>          ->  Result Cache (actual rows=1 loops=10000)
>                Cache Key: t1.twenty
>                Hits: 9980  Misses: 20  Evictions: 0  Overflows: 0
>                ->  Index Scan using tenk1_unique1 on tenk1 t2 (actual
> rows=1 loops=20)
>                      Index Cond: (unique1 = t1.twenty)
> (8 rows)
>
> Andres' suggestion:
>
> regression=# explain (analyze, costs off, timing off, summary off)
> select count(*) from tenk1 t1 inner join tenk1 t2 on
> t1.twenty=t2.unique1;
>                                       QUERY PLAN
>
> ---------------------------------------------------------------------------------------
>  Aggregate (actual rows=1 loops=1)
>    ->  Nested Loop (actual rows=10000 loops=1)
>           Cache Key: t1.twenty  Hits: 9980  Misses: 20  Evictions: 0
> Overflows: 0
>         ->  Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
>         ->  Index Scan using tenk1_unique1 on tenk1 t2 (actual rows=1
> loops=20)
>               Index Cond: (unique1 = t1.twenty)
> (6 rows)
>
> and for subplans:
>
> Current method:
>
> regression=# explain (analyze, costs off, timing off, summary off)
> select twenty, (select count(*) from tenk1 t2 where t1.twenty =
> t2.twenty) from tenk1 t1;
>                              QUERY PLAN
> ---------------------------------------------------------------------
>  Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
>    SubPlan 1
>      ->  Result Cache (actual rows=1 loops=10000)
>            Cache Key: t1.twenty
>            Hits: 9980  Misses: 20  Evictions: 0  Overflows: 0
>            ->  Aggregate (actual rows=1 loops=20)
>                  ->  Seq Scan on tenk1 t2 (actual rows=500 loops=20)
>                        Filter: (t1.twenty = twenty)
>                        Rows Removed by Filter: 9500
> (9 rows)
>
> Andres' suggestion:
>
> regression=# explain (analyze, costs off, timing off, summary off)
> select twenty, (select count(*) from tenk1 t2 where t1.twenty =
> t2.twenty) from tenk1 t1;
>                              QUERY PLAN
> ---------------------------------------------------------------------
>  Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
>    SubPlan 1
>     Cache Key: t1.twenty  Hits: 9980  Misses: 20  Evictions: 0  Overflows:
> 0
>     ->  Aggregate (actual rows=1 loops=20)
>           ->  Seq Scan on tenk1 t2 (actual rows=500 loops=20)
>                 Filter: (t1.twenty = twenty)
>                 Rows Removed by Filter: 9500
> (7 rows)
>
> I've spoken to one other person off-list about this and they suggested
> that they prefer Andres' suggestion on performance grounds that it's
> less overhead to pull tuples through the plan and cheaper executor
> startup/shutdowns due to fewer nodes.
>

I didn't do performance tests, that should be necessary, but I think
Andres' variant is a little bit more readable.

The performance is most important, but readability of EXPLAIN is
interesting too.

Regards

Pavel


>
> I don't object to making the change. I just object to making it only
> to put it back again later when someone else speaks up that they'd
> prefer to keep nodes modular and not overload them in obscure ways.
>
> So other input is welcome.  Is it too weird to overload SubPlan and
> Nested Loop this way?  Or okay to do that if it squeezes out a dozen
> or so nanoseconds per tuple?
>
> I did some analysis into the overhead of pulling tuples through an
> additional executor node in [1].
>
> David
>
> [1]
> https://www.postgresql.org/message-id/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B%3DZRh-rxy9qxfPA5Gw%40mail.gmail.com
>
>
>

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Reply via email to