[
https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932891#comment-16932891
]
Scott Reynolds commented on CALCITE-963:
----------------------------------------
{quote} * I was surprised that HoistedVariables was added to so many API calls.
Maybe it could belong to another piece of the preparation context{quote}
I was tempted to add it to {{DataContext}}. It does have to land into
{{CalciteSignature}}
{code:java}
public CalciteSignature(String sql,
List<AvaticaParameter> parameterList,
Map<String, Object> internalParameters,
RelDataType rowType,
List<ColumnMetaData> columns,
Meta.CursorFactory cursorFactory,
CalciteSchema rootSchema,
List<RelCollation> collationList,
long maxRowCount,
Bindable<T> bindable,
Meta.StatementType statementType,
HoistedVariables variables) {
super(columns, sql, parameterList, internalParameters, cursorFactory,
statementType);
this.rowType = rowType;
this.rootSchema = rootSchema;
this.collationList = collationList;
this.maxRowCount = maxRowCount;
this.bindable = bindable;
this.variables = variables;
}
public Enumerable<T> enumerable(DataContext dataContext) {
Enumerable<T> enumerable = bindable.bind(dataContext, variables);
if (maxRowCount >= 0) {
// Apply limit. In JDBC 0 means "no limit". But for us, -1 means
// "no limit", and 0 is a valid limit.
enumerable = EnumerableDefaults.take(enumerable, maxRowCount);
}
return enumerable;
}
{code}
{quote}I was surprised that this work touched so much Enumerable code. An
alternative approach would be to transform a RelNode tree, early in the
planning process, transforming some RexLiteral instances into RexDynamicParam.
The rest of the planning process would proceed as if the user had provided a
statement with bind variables. I know you state that this was a non-goal, but
why was it not a goal? It probably would have been a lot simpler, and it would
have worked with conventions besides Enumerable.{quote}
Ya I should add color to this. Given the discussions on the mailing list I
would like to add an optimization to our use case. Our Fact Tables contain a
single account hierarchy represented by a column {{sub_account}}. Often a Fact
will not be associated with an {{sub_account}} and so instead of storing that
as null and dealing with missing right hand row, we store a sentinel value in
the Fact so our joins to the account dimension are simpler. So when a query
comes in that for all Facts that do not contain a sub account, we want to
change the cost calculation of that filter as {{sub_account_sid == 'IS_NULL'}}
is not as selective as {{sub_account_sid = 'AC128485'}}. So my understanding
is, by translating into a {{RexDynamicParam}}, we lose out on this optimization.
> Hoist literals
> --------------
>
> Key: CALCITE-963
> URL: https://issues.apache.org/jira/browse/CALCITE-963
> Project: Calcite
> Issue Type: Bug
> Reporter: Julian Hyde
> Priority: Major
> Labels: pull-request-available
> Attachments: HoistedVariables.png
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Convert literals into (internal) bind variables so that statements that
> differ only in literal values can be executed using the same plan.
> In [mail
> thread|http://mail-archives.apache.org/mod_mbox/calcite-dev/201511.mbox/%[email protected]%3E]
> Homer wrote:
> {quote}Imagine that it is common to run a large number of very similar
> machine generated queries that just change the literals in the sql query.
> For example (the real queries would be much more complex):
> {code}Select * from emp where empno = 1;
> Select * from emp where empno = 2;
> etc.{code}
> The plan that is likely being generated for these kind of queries is going to
> be very much the same each time, so to save some time, I would like to
> recognize that the literals are all that have changed in a query and use the
> previously optimized execution plan and just replace the literals.{quote}
> I think this could be done as a transform on the initial RelNode tree. It
> would find literals (RexLiteral), replace them with bind variables
> (RexDynamicParam) and write the value into a pool. The next statement would
> go through the same process and the RelNode tree would be identical, but with
> possibly different values for the bind variables.
> The bind variables are of course internal; not visible from JDBC. When the
> statement is executed, the bind variables are implicitly bound.
> Statements would be held in a Guava cache.
> This would be enabled by a config parameter. Unfortunately I don't think we
> could do this by default -- we'd lose optimization power because we would no
> longer be able to do constant reduction.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)