zabetak commented on code in PR #5249:
URL: https://github.com/apache/hive/pull/5249#discussion_r1685370053
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveMaterializedViewUtils.java:
##########
@@ -536,4 +545,33 @@ private static Map<String, SnapshotContext>
getSnapshotOf(Hive db, Set<TableName
}
return snapshot;
}
+
+ public static RelOptMaterialization createCTEMaterialization(String
viewName, RelNode body, HiveConf conf) {
+ RelOptCluster cluster = body.getCluster();
+ List<ColumnInfo> columns = new ArrayList<>();
+ for (RelDataTypeField f : body.getRowType().getFieldList()) {
+ TypeInfo info = TypeConverter.convert(f.getType());
+ columns.add(new ColumnInfo(f.getName(), info, f.getType().isNullable(),
viewName, false, false));
+ }
+ List<String> fullName = Arrays.asList("cte", viewName);
+ org.apache.hadoop.hive.metastore.api.Table metaTable =
Table.getEmptyTable("cte", viewName);
+ metaTable.setTemporary(true);
+ try {
+ // Setting a location avoids a NPE when fetching statistics
+
metaTable.getSd().setLocation(SessionState.generateTempTableLocation(conf));
+ } catch (MetaException e) {
+ throw new RuntimeException(e);
+ }
+ Table hiveTable = new Table(metaTable);
+ hiveTable.setMaterializedTable(true);
+ RelOptHiveTable optTable =
+ new RelOptHiveTable(null, cluster.getTypeFactory(), fullName,
body.getRowType(), hiveTable, columns,
+ Collections.emptyList(), Collections.emptyList(), new HiveConf(),
Hive.getThreadLocal(),
+ new QueryTables(true), new HashMap<>(), new HashMap<>(), new
AtomicInteger());
+ optTable.setRowCount(cluster.getMetadataQuery().getRowCount(body));
Review Comment:
This is a very good question. The short answer is that at the moment there
is probably no benefit in doing so. However, as this feature evolves (better
cost-model, implementation of `hive.cbo.returnpath.hiveop` for CTEs) it may be
beneficial to consider traits. Below, you can find some longer answers to more
specific parts of this question regarding the state right now and what we could
possibly do in the future.
**Why are we explicitly setting the row count here?**
Setting the row count mainly helps to break tie breaks across CTEs. Since
the `RelOptHiveTable` that is created here does not have (yet) a physical
equivalent the optimizer will attempt to gather statistics following the
default logic, which will probably trigger metastore (and possibly HDFS) calls,
and the result will be similar to that of an empty table (~zero rows). This is
problematic because it incurs unnecessary overhead and gives the same cost to
every CTE.
**How is the row count estimated?**
The row count is estimated by using `HiveDefaultRelMetadataProvider` and
the logic in `HiveRelMdRowCount`, `HiveRelMdDistinctRowCount`, etc. If we want
we could plug another metadata provider but at the moment I don't find it
necessary.
**Should we propagate distribution and collation?**
Trait propagation is not necessary at the moment. The distribution and
collation traits are not used by the CTE rewriting transformation. In addition,
the CTE rewriting phase is applied at the end of the CBO transformations so
even if the traits are lost there are no subsequent steps to exploit them.
**Can we guarantee trait propagation when using CTEs?**
If the `body` of the view/CTE contains some traits those will remain intact
by this part of the code. However, since the `body` is created by the suggester
the traits are not necessarily retained; the propagation behavior is strongly
implementation dependent. Some suggesters may be able to retain and conserve
traits while other may destroy them completely with the latter being more
plausible.
**Is the TableSpool using traits in some way?**
No, because as I explained above traits are not used in the CTE rewriting
phase.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]