[ https://issues.apache.org/jira/browse/HIVE-18680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367612#comment-16367612 ]
Jesus Camacho Rodriguez commented on HIVE-18680: ------------------------------------------------ I regenerated the q files. [~ashutoshc], could you take a look? This patch will prune some columns from the Calcite side that are not needed in the plan, we were not realizing because Hive would prune the columns too, however it can be seen that some Project operators go away so some plans improve. The original {{trimChilds}} method looked like this: {code} protected TrimResult trimChild( RelNode rel, RelNode input, final ImmutableBitSet fieldsUsed, Set<RelDataTypeField> extraFields) { final ImmutableBitSet.Builder fieldsUsedBuilder = fieldsUsed.rebuild(); // Fields that define the collation cannot be discarded. final RelMetadataQuery mq = rel.getCluster().getMetadataQuery(); final ImmutableList<RelCollation> collations = mq.collations(input); for (RelCollation collation : collations) { for (RelFieldCollation fieldCollation : collation.getFieldCollations()) { fieldsUsedBuilder.set(fieldCollation.getFieldIndex()); } } // Correlating variables are a means for other relational expressions to use // fields. for (final CorrelationId correlation : rel.getVariablesSet()) { rel.accept( new CorrelationReferenceFinder() { protected RexNode handle(RexFieldAccess fieldAccess) { final RexCorrelVariable v = (RexCorrelVariable) fieldAccess.getReferenceExpr(); if (v.id.equals(correlation)) { fieldsUsedBuilder.set(fieldAccess.getField().getIndex()); } return fieldAccess; } }); } return dispatchTrimFields(input, fieldsUsedBuilder.build(), extraFields); } {code} A bit of explanation. Observe that the fields with the collation are not discarded in the original method: I explored this in Calcite and it has to do with the decorrelation logic. However, we could prune them, we populate them for the return path (whether we should create Exchange operators to redistribute the data) but still they can be pruned if they are not used: that is what we do in this patch. Collation for SortLimit, which is the only relevant for us, is kept correctly as the columns in the collation are added to the needed columns by the operator. > FieldTrimmer missing opportunity with SortLimit operators > --------------------------------------------------------- > > Key: HIVE-18680 > URL: https://issues.apache.org/jira/browse/HIVE-18680 > Project: Hive > Issue Type: Bug > Components: CBO > Affects Versions: 3.0.0 > Reporter: Jesus Camacho Rodriguez > Assignee: Jesus Camacho Rodriguez > Priority: Major > Attachments: HIVE-18680.01.patch, HIVE-18680.patch > > > In the following plan, {{_o__col4}} is not needed: > {code} > 2018-02-11T11:35:14,501 DEBUG [0fda1a20-c7f1-489e-b287-65f35fda9848 main] > calcite.sql2rel: Plan after trimming unused fields > HiveProject(download_volume_bytes=[$0], ri_date=[$1], end_cell_id=[$2], > environment=[$3]) > HiveSortLimit(sort0=[$2], dir0=[ASC-nulls-first], fetch=[100]) > HiveProject(download_volume_bytes=[$4], ri_date=[$3], end_cell_id=[$1], > environment=[$2], _o__col4=[$0]) > HiveAggregate(group=[{0, 1, 2, 3}], agg#0=[sum($4)]) > HiveProject($f0=[$1], $f1=[$2], $f2=[$3], $f3=[$0], $f4=[$4]) > HiveProject(ri_date=[$0], imsi=[$1], end_cell_id=[$2], > environment=[$3], s1_u_download_data_volume=[$4]) > HiveTableScan(table=[[default.lsr094]], table:alias=[a]) > {code} > {{_o__col4}} should be removed by the FieldTrimmer. Otherwise, this may > prevent MV rewriting to be triggered on some queries containing order > by/limit clauses. -- This message was sent by Atlassian JIRA (v7.6.3#76005)