On Tue, Dec 12, 2017 at 3:43 PM, Ashutosh Bapat < ashutosh.ba...@enterprisedb.com> wrote:
> Here are review comments for 0009 > Only full aggregation is pushed on the remote server. > > I think we can live with that for a while but we need to be able to push > down > partial aggregates to the foreign server. I agree that it needs some > infrastructure to serialized and deserialize the partial aggregate values, > support unpartitioned aggregation first and then work on partitioned > aggregates. That is definitely a separate piece of work. > Yep. > > +CREATE TABLE pagg_tab_p3 (a int, b int, c text); > > Like partition-wise join testcases please use LIKE so that it's easy to > change > the table schema if required. > Done. > > +INSERT INTO pagg_tab_p1 SELECT i % 30, i % 50, to_char(i/30, > 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 10; > +INSERT INTO pagg_tab_p2 SELECT i % 30, i % 50, to_char(i/30, > 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 20 and (i % > 30) >= 10; > +INSERT INTO pagg_tab_p3 SELECT i % 30, i % 50, to_char(i/30, > 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 30 and (i % > 30) >= 20; > > We have to do this because INSERT tuple routing to a foreign partition is > not > supported right now. Yes. > Somebody has to remember to change this to a single > statement once that's done. > I don't know how we can keep track of it. > > +ANALYZE fpagg_tab_p1; > +ANALYZE fpagg_tab_p2; > +ANALYZE fpagg_tab_p3; > > I thought this is not needed. When you ANALYZE the partitioned table, it > would > analyze the partitions as well. But I see that partition-wise join is also > ANALYZING the foreign partitions separately. When I ran ANALYZE on a > partitioned table with foreign partitions, statistics for only the local > tables > (partitioned and partitions) was updated. Of course this is separate work, > but > probably needs to be fixed. > Hmm. > > +-- When GROUP BY clause matches with PARTITION KEY. > +-- Plan when partition-wise-agg is disabled > > s/when/with/ > > +-- Plan when partition-wise-agg is enabled > > s/when/with/ > Done. > > + -> Append > > Just like ForeignScan node's Relations tell what kind of ForeignScan this > is, > may be we should annotate Append to tell whether the children are joins, > aggregates or relation scans. That might be helpful. Of course as another > patch. > OK. > > +SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING > avg(b) < 25 ORDER BY 1; > + a | sum | min | count > +----+------+-----+------- > + 0 | 2000 | 0 | 100 > + 1 | 2100 | 1 | 100 > [ ... clipped ...] > + 23 | 2300 | 3 | 100 > + 24 | 2400 | 4 | 100 > +(15 rows) > > May be we want to reduce the number of rows to a few by using a stricter > HAVING > clause? > Done. > > + > +-- When GROUP BY clause not matches with PARTITION KEY. > > ... clause does not match ... > Done. > > +EXPLAIN (VERBOSE, COSTS OFF) > +SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING > sum(a) < 800 ORDER BY 1; > + > QUERY PLAN > +----------------------------------------------------------- > ------------------------------------------------------------ > ------------------------------ > + Sort > + Output: fpagg_tab_p1.b, (avg(fpagg_tab_p1.a)), > (max(fpagg_tab_p1.a)), (count(*)) > + -> Partial HashAggregate > [ ... clipped ... ] > + Output: fpagg_tab_p3.b, PARTIAL > avg(fpagg_tab_p3.a), PARTIAL max(fpagg_tab_p3.a), PARTIAL count(*), > PARTIAL sum(fpagg_tab_p3.a) > + Group Key: fpagg_tab_p3.b > + -> Foreign Scan on public.fpagg_tab_p3 > + Output: fpagg_tab_p3.b, fpagg_tab_p3.a > + Remote SQL: SELECT a, b FROM public.pagg_tab_p3 > +(26 rows) > > I think we interested in overall shape of the plan and not the details of > Remote SQL etc. So, may be turn off VERBOSE. This comment applies to an > earlier > plan with enable_partition_wise_agg = false; > OK. Removed VERBOSE from all the queries as we are interested in overall shape of the plan. > > + > +SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING > sum(a) < 800 ORDER BY 1; > + b | avg | max | count > +----+---------------------+-----+------- > + 0 | 10.0000000000000000 | 20 | 60 > + 1 | 11.0000000000000000 | 21 | 60 > [... clipped ...] > + 42 | 12.0000000000000000 | 22 | 60 > + 43 | 13.0000000000000000 | 23 | 60 > +(20 rows) > > Since the aggregates were not pushed down, I doubt we should be testing the > output. But this test is good to check partial aggregates over foreign > partition scans, which we don't have in postgres_fdw.sql I think. So, may > be > add it as a separate patch? > Agree. Removed SELECT query. EXPLAIN is enough here. > > Can you please add a test where we reference a whole-row; that usually has > troubles. > Added. > > - if (root->hasHavingQual && query->havingQual) > + if (root->hasHavingQual && fpinfo->havingQual) > > This is not exactly a problem with your patch, but why do we need to check > both > the boolean and the actual clauses? If the boolean is true, > query->havingQual > should be non-NULL and NULL otherwise. > Agree. But since this is not fault of this patch, I have kept as is. > > /* Grouping information */ > List *grouped_tlist; > + PathTarget *grouped_target; > + Node *havingQual; > > I think we don't need havingQual as a separate member. > foreign_grouping_ok() > separates the clauses in havingQual into shippable and non-shippable > clauses > and saves in local_conditions and remote_conditions. Probably we want to > use > those instead of adding a new member. > local/remote_conditions is a list of RestrictInfos which cannot be used while costing the aggregates. So we anyways need to store the havingQual. > > index 04e43cc..c8999f6 100644 > --- a/src/include/foreign/fdwapi.h > +++ b/src/include/foreign/fdwapi.h > @@ -62,7 +62,8 @@ typedef void (*GetForeignJoinPaths_function) > (PlannerInfo *root, > typedef void (*GetForeignUpperPaths_function) (PlannerInfo *root, > UpperRelationKind stage, > RelOptInfo *input_rel, > - RelOptInfo *output_rel); > + RelOptInfo *output_rel, > + UpperPathExtraData *extra); > > Probably this comment belongs to 0007, but it's in this patch that it > becomes > clear how invasive UpperPathExtraData changes are. While > UpperPathExtraData has > upper paths in the name, all of its members are related to grouping. That's > fine since we only support partition-wise aggregate and not the other upper > operations. But if we were to do that in future, which of these members > would > be applicable to other upper relations? inputRows, pathTarget, > partialPathTarget may be applicable to other upper rels as well. can_sort, > can_hash may be applicable to DISTINCT, SORT relations. isPartial and > havingQual will be applicable only to Grouping/Aggregation. So, may be > it's ok, > and like RelOptInfo we may separate them by comments. > I have grouped them like you said but not added any comments yet as I am not sure at this point that which fields will be used by those other upper rel kinds. Please have a look. > > Another problem with that structure is its name doesn't mention that the > structure is used only for child upper relations, whereas the code assumes > that > if extra is not present it's a parent upper relation. May be we want to > rename > it to that effect or always use it whether for a parent or a child > relation. > Renamed to OtherUpperPathExtraData. > We may want to rename pathTarget and partialPathTarget as relTarget and > partialRelTarget since those targets are not specific to any path, but > will be > applicable to all the paths created for that rel. > Renamed. These fixes are part of the v9 patchset. Thanks -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company