st 19. 8. 2020 v 5:48 odesÃlatel David Rowley <dgrowle...@gmail.com> napsal:
> On Tue, 18 Aug 2020 at 21:42, David Rowley <dgrowle...@gmail.com> wrote: > > > > On Tue, 11 Aug 2020 at 17:44, Andres Freund <and...@anarazel.de> wrote: > > > > > > Hi, > > > > > > On 2020-08-11 17:23:42 +1200, David Rowley wrote: > > > > On Tue, 11 Aug 2020 at 12:21, Andres Freund <and...@anarazel.de> > wrote: > > > > > > > > > > On 2020-07-09 10:25:14 +1200, David Rowley wrote: > > > > > > On Thu, 9 Jul 2020 at 04:53, Andres Freund <and...@anarazel.de> > wrote: > > > > > > > I'm not convinced it's a good idea to introduce a separate > executor node > > > > > > > for this. There's a fair bit of overhead in them, and they > will only be > > > > > > > below certain types of nodes afaict. It seems like it'd be > better to > > > > > > > pull the required calls into the nodes that do parametrized > scans of > > > > > > > subsidiary nodes. Have you considered that? > > > > > > > > > > > > I see 41 different node types mentioned in ExecReScan(). I don't > > > > > > really think it would be reasonable to change all those. > > > > > > > > > > But that's because we dispatch ExecReScan mechanically down to > every > > > > > single executor node. That doesn't determine how many nodes would > need > > > > > to modify to include explicit caching? What am I missing? > > > > > > > > > > Wouldn't we need roughly just nodeNestloop.c and nodeSubplan.c > > > > > integration? > > > > > > > > hmm, I think you're right there about those two node types. I'm just > > > > not sure you're right about overloading these node types to act as a > > > > cache. > > > > > > I'm not 100% either, to be clear. I am just acutely aware that adding > > > entire nodes is pretty expensive, and that there's, afaict, no need to > > > have arbitrary (i.e. pointer to function) type callbacks to point to > the > > > cache. > > > > Perhaps you're right, but I'm just not convinced of it. I feel > > there's a certain air of magic involved in any node that has a good > > name and reputation for doing one thing that we suddenly add new > > functionality to which causes it to perform massively differently. > > > > [ my long babble removed] > > > I'm wondering if anyone else has any thoughts on this? > > Just for anyone following along at home. The two variations would > roughly look like: > > Current method: > > regression=# explain (analyze, costs off, timing off, summary off) > select count(*) from tenk1 t1 inner join tenk1 t2 on > t1.twenty=t2.unique1; > QUERY PLAN > > --------------------------------------------------------------------------------------- > Aggregate (actual rows=1 loops=1) > -> Nested Loop (actual rows=10000 loops=1) > -> Seq Scan on tenk1 t1 (actual rows=10000 loops=1) > -> Result Cache (actual rows=1 loops=10000) > Cache Key: t1.twenty > Hits: 9980 Misses: 20 Evictions: 0 Overflows: 0 > -> Index Scan using tenk1_unique1 on tenk1 t2 (actual > rows=1 loops=20) > Index Cond: (unique1 = t1.twenty) > (8 rows) > > Andres' suggestion: > > regression=# explain (analyze, costs off, timing off, summary off) > select count(*) from tenk1 t1 inner join tenk1 t2 on > t1.twenty=t2.unique1; > QUERY PLAN > > --------------------------------------------------------------------------------------- > Aggregate (actual rows=1 loops=1) > -> Nested Loop (actual rows=10000 loops=1) > Cache Key: t1.twenty Hits: 9980 Misses: 20 Evictions: 0 > Overflows: 0 > -> Seq Scan on tenk1 t1 (actual rows=10000 loops=1) > -> Index Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 > loops=20) > Index Cond: (unique1 = t1.twenty) > (6 rows) > > and for subplans: > > Current method: > > regression=# explain (analyze, costs off, timing off, summary off) > select twenty, (select count(*) from tenk1 t2 where t1.twenty = > t2.twenty) from tenk1 t1; > QUERY PLAN > --------------------------------------------------------------------- > Seq Scan on tenk1 t1 (actual rows=10000 loops=1) > SubPlan 1 > -> Result Cache (actual rows=1 loops=10000) > Cache Key: t1.twenty > Hits: 9980 Misses: 20 Evictions: 0 Overflows: 0 > -> Aggregate (actual rows=1 loops=20) > -> Seq Scan on tenk1 t2 (actual rows=500 loops=20) > Filter: (t1.twenty = twenty) > Rows Removed by Filter: 9500 > (9 rows) > > Andres' suggestion: > > regression=# explain (analyze, costs off, timing off, summary off) > select twenty, (select count(*) from tenk1 t2 where t1.twenty = > t2.twenty) from tenk1 t1; > QUERY PLAN > --------------------------------------------------------------------- > Seq Scan on tenk1 t1 (actual rows=10000 loops=1) > SubPlan 1 > Cache Key: t1.twenty Hits: 9980 Misses: 20 Evictions: 0 Overflows: > 0 > -> Aggregate (actual rows=1 loops=20) > -> Seq Scan on tenk1 t2 (actual rows=500 loops=20) > Filter: (t1.twenty = twenty) > Rows Removed by Filter: 9500 > (7 rows) > > I've spoken to one other person off-list about this and they suggested > that they prefer Andres' suggestion on performance grounds that it's > less overhead to pull tuples through the plan and cheaper executor > startup/shutdowns due to fewer nodes. > I didn't do performance tests, that should be necessary, but I think Andres' variant is a little bit more readable. The performance is most important, but readability of EXPLAIN is interesting too. Regards Pavel > > I don't object to making the change. I just object to making it only > to put it back again later when someone else speaks up that they'd > prefer to keep nodes modular and not overload them in obscure ways. > > So other input is welcome. Is it too weird to overload SubPlan and > Nested Loop this way? Or okay to do that if it squeezes out a dozen > or so nanoseconds per tuple? > > I did some analysis into the overhead of pulling tuples through an > additional executor node in [1]. > > David > > [1] > https://www.postgresql.org/message-id/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B%3DZRh-rxy9qxfPA5Gw%40mail.gmail.com > > >