[jira] [Commented] (HIVE-18680) FieldTrimmer missing opportunity with SortLimit operators

Jesus Camacho Rodriguez (JIRA) Fri, 16 Feb 2018 09:24:15 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-18680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367612#comment-16367612
 ]


Jesus Camacho Rodriguez commented on HIVE-18680:
------------------------------------------------

I regenerated the q files. [~ashutoshc], could you take a look? This patch will 
prune some columns from the Calcite side that are not needed in the plan, we 
were not realizing because Hive would prune the columns too, however it can be 
seen that some Project operators go away so some plans improve.

The original {{trimChilds}} method looked like this:
{code}
  protected TrimResult trimChild(
      RelNode rel,
      RelNode input,
      final ImmutableBitSet fieldsUsed,
      Set<RelDataTypeField> extraFields) {
    final ImmutableBitSet.Builder fieldsUsedBuilder = fieldsUsed.rebuild();

    // Fields that define the collation cannot be discarded.
    final RelMetadataQuery mq = rel.getCluster().getMetadataQuery();
    final ImmutableList<RelCollation> collations = mq.collations(input);
    for (RelCollation collation : collations) {
      for (RelFieldCollation fieldCollation : collation.getFieldCollations()) {
        fieldsUsedBuilder.set(fieldCollation.getFieldIndex());
      }
    }

    // Correlating variables are a means for other relational expressions to use
    // fields.
    for (final CorrelationId correlation : rel.getVariablesSet()) {
      rel.accept(
          new CorrelationReferenceFinder() {
            protected RexNode handle(RexFieldAccess fieldAccess) {
              final RexCorrelVariable v =
                  (RexCorrelVariable) fieldAccess.getReferenceExpr();
              if (v.id.equals(correlation)) {
                fieldsUsedBuilder.set(fieldAccess.getField().getIndex());
              }
              return fieldAccess;
            }
          });
    }

    return dispatchTrimFields(input, fieldsUsedBuilder.build(), extraFields);
  }
{code}
A bit of explanation. Observe that the fields with the collation are not 
discarded in the original method: I explored this in Calcite and it has to do 
with the decorrelation logic. However, we could prune them, we populate them 
for the return path (whether we should create Exchange operators to 
redistribute the data) but still they can be pruned if they are not used: that 
is what we do in this patch. Collation for SortLimit, which is the only 
relevant for us, is kept correctly as the columns in the collation are added to 
the needed columns by the operator.

> FieldTrimmer missing opportunity with SortLimit operators
> ---------------------------------------------------------
>
>                 Key: HIVE-18680
>                 URL: https://issues.apache.org/jira/browse/HIVE-18680
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>    Affects Versions: 3.0.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Major
>         Attachments: HIVE-18680.01.patch, HIVE-18680.patch
>
>
> In the following plan, {{_o__col4}} is not needed:
> {code}
> 2018-02-11T11:35:14,501 DEBUG [0fda1a20-c7f1-489e-b287-65f35fda9848 main] 
> calcite.sql2rel: Plan after trimming unused fields
> HiveProject(download_volume_bytes=[$0], ri_date=[$1], end_cell_id=[$2], 
> environment=[$3])
>   HiveSortLimit(sort0=[$2], dir0=[ASC-nulls-first], fetch=[100])
>     HiveProject(download_volume_bytes=[$4], ri_date=[$3], end_cell_id=[$1], 
> environment=[$2], _o__col4=[$0])
>       HiveAggregate(group=[{0, 1, 2, 3}], agg#0=[sum($4)])
>         HiveProject($f0=[$1], $f1=[$2], $f2=[$3], $f3=[$0], $f4=[$4])
>           HiveProject(ri_date=[$0], imsi=[$1], end_cell_id=[$2], 
> environment=[$3], s1_u_download_data_volume=[$4])
>             HiveTableScan(table=[[default.lsr094]], table:alias=[a])
> {code}
> {{_o__col4}} should be removed by the FieldTrimmer. Otherwise, this may 
> prevent MV rewriting to be triggered on some queries containing order 
> by/limit clauses.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18680) FieldTrimmer missing opportunity with SortLimit operators

Reply via email to