[
https://issues.apache.org/jira/browse/CALCITE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932829#comment-16932829
]
Scott Reynolds edited comment on CALCITE-963 at 9/18/19 8:57 PM:
-----------------------------------------------------------------
h1. Goal
When a query is issued to Calcite it is parsed, optimized and then generates a
String of Java Class that implements {{Bindable}}. {{EnumerableInterpretable}}
creates this string and checks to see if that string exists in
{{com.google.common.cache}} and if it doesn't it will call into a Java
compiler. Compilation process can take a considerable amount of time, Apache
Kylin reported 50 to 150ms of additional computation time. Today, Apache
Calcite will generate unique Java Class strings whenever any part of the query
changes. This document details out the design and implementation of a hoisting
technique within Apache Calcite. This design and implementation greatly
increases the cache hit rate of {{EnumerableInterpretable}}'s
{{BINDABLE_CACHE}}.
h1. Non Goals
This implementation is not designed to change the planning process. It does not
transform {{RexLiteral}} into {{RexDynamicParam}}, and doesn't change the cost
calculation of the query.
h1. Implementation Details
After a query has been optimized there are three phases that remaining phases
to the query:
# Generating the Java code
# Binding Hoisted Variables
# Runtime execution via {{Bindable.bind(DataContext, HoistedVariables)}}
Each of these phases will interact with a new class called {{HoistedVariables}}
!HoistedVariables.png!
Each of these methods are used in the above three phases to hoist a variable
from within the query into the runtime execution of the {{Bindable}}.
The method {{implement}} of the interface {{EnumerableRel}} is used to generate
the Java code in phase one. Each of these {{RelNode}} can now call
{{registerVariable(String)}} to allocate a {{Slot}} for their unbound value.
This {{Slot}} is reserved for their use and is unique for the query plan. When
a {{RelNode}} registers a variable it needs to save that {{Slot}} into a
property so it can be referenced in phase 2. This {{Slot}} is then referenced
in code generation by calling {{EnumerableRel.lookupValue}} which returns an
{{Expression}} that will extract the bound value at for the {{Slot}}.
Below is a snippet from {{EnumerableLimit}} implementation of {{implement}}
that uses {{HoistedVariables}}.
{code:java}
Expression v = builder.append("child", result.block);
if (offset != null) {
if (offset instanceof RexDynamicParam) {
v = getDynamicExpression((RexDynamicParam) offset);
} else {
// Register with Hoisted Variable here
offsetIndex = variables.registerVariable("offset");
v = builder.append(
"offset",
Expressions.call(
v,
BuiltInMethod.SKIP.method,
// At runtime, fetch the bound variable. This returns the Java code to do
that.
EnumerableRel.lookupValue(offsetIndex, Integer.class)));
}
}
if (fetch != null) {
if (fetch instanceof RexDynamicParam) {
v = getDynamicExpression((RexDynamicParam) fetch);
} else {
// Register with Hoisted Variable here
this.fetchIndex = variables.registerVariable("fetch");
v = builder.append(
"fetch",
Expressions.call(
v,
BuiltInMethod.TAKE.method,
// At runtime, fetch the bound variable. This returns the Java code to do
that.
EnumerableRel.lookupValue(fetchIndex, Integer.class)));
}
}
{code}
The second phase of the query execution is where registered {{Slots}} get
bound. To this, our change adds a new optional method to {{Bindable}} called
{{hoistVariables}}. This method is where an instance of {{EnumerableRel}}
extracts the values out of the query plan and binds them into the
{{HoistedVariables}} instance just prior to executing the query. Below is
{{EnumerableLimit}} implementation:
{code:java}
@Override public void hoistedVariables(HoistedVariables variables) {
getInputs()
.stream()
.forEach(rel -> {
final EnumerableRel enumerable = (EnumerableRel) rel;
enumerable.hoistedVariables(variables);
});
if (fetchIndex != null) {
// fetchIndex is the registered slot for this variable. Bind fetchIndex to
fetch
variables.setVariable(fetchIndex, RexLiteral.intValue(fetch));
}
if (offsetIndex != null) {
// offsetIndex is the registered slot for this variable. Bind offsetIndex
to offset.
variables.setVariable(offsetIndex, RexLiteral.intValue(offset));
}
}
{code}
To tie these three phases together, {{CalcitePrepareImpl}} needs to setup the
variables when it creates a {{PreparedResult}}:
{code:java}
try {
CatalogReader.THREAD_LOCAL.set(catalogReader);
final SqlConformance conformance = context.config().conformance();
internalParameters.put("_conformance", conformance);
// Get the compiled Bindable instance either from cache or generate a new one.
bindable = EnumerableInterpretable.toBindable(internalParameters,
context.spark(), enumerable, prefer, variables);
// Bind any hoisted variables used in the Bindable.
enumerable.hoistedVariables(variables);
} finally {
CatalogReader.THREAD_LOCAL.remove();
}
{code}
These hoisted variables are then passed into the {{CalciteSignature}} which
finally calls {{Bindable.bind(dataContext, variables}}.
{code:java}
public Enumerable<T> enumerable(DataContext dataContext) {
Enumerable<T> enumerable = bindable.bind(dataContext, variables);
if (maxRowCount >= 0) {
// Apply limit. In JDBC 0 means "no limit". But for us, -1 means
// "no limit", and 0 is a valid limit.
enumerable = EnumerableDefaults.take(enumerable, maxRowCount);
}
return enumerable;
}
{code}
To review, Code Generation that needs to register an intent to hoist a variable
and then generate code that +fetches+ that hoisted variable and prior to
executing the resulting {{Bindable}}, any registered hoisted variables need to
be bound by calling each {{EnumerablRel}}'s {{hoistVariables}} method.
h1. Future Work
This pull request only includes an update to {{EnumerableLimit}} as it was the
simplest {{EnumerableRel}} to update. Internally at Twilio, we have our own
{{KuduToEnumerableRel}} that was used extensively to test this and the
performance improvements we have seen from this implementation.
was (Author: scottreynolds):
h1. Goal
When a query is issued to Calcite it is parsed, optimized and then generates a
String of Java Class that implements {{Bindable}}. {{EnumerableInterpretable}}
creates this string and checks to see if that string exists in
{{com.google.common.cache}} and if it doesn't it will call into a Java
compiler. Compilation process can take a considerable amount of time, Apache
Kylin reported 50 to 150ms of additional computation time. Today, Apache
Calcite will generate unique Java Class strings whenever any part of the query
changes. This document details out the design and implementation of a hoisting
technique within Apache Calcite. This design and implementation greatly
increases the cache hit rate of {{EnumerableInterpretable}}'s
{{BINDABLE_CACHE}}.
h1. Non Goals
This implementation is not designed to change the planning process. It does not
transform {{RexLiteral}} into {{RexDynamicParam}}, and doesn't change the cost
calculation of the query.
h1. Implementation Details
After a query has been optimized there are three phases that remaining phases
to the query:
# Generating the Java code
# Binding Hoisted Variables
# Runtime execution via {{Bindable.bind(DataContext, HoistedVariables)}}
Each of these phases will interact with a new class called {{HoistedVariables}}
Each of these methods are used in the above three phases to hoist a variable
from within the query into the runtime execution of the {{Bindable}}.
The method {{implement}} of the interface {{EnumerableRel}} is used to generate
the Java code in phase one. Each of these {{RelNode}} can now call
{{registerVariable(String)}} to allocate a {{Slot}} for their unbound value.
This {{Slot}} is reserved for their use and is unique for the query plan. When
a {{RelNode}} registers a variable it needs to save that {{Slot}} into a
property so it can be referenced in phase 2. This {{Slot}} is then referenced
in code generation by calling {{EnumerableRel.lookupValue}} which returns an
{{Expression}} that will extract the bound value at for the {{Slot}}.
Below is a snippet from {{EnumerableLimit}} implementation of {{implement}}
that uses {{HoistedVariables}}.
{code:java}
Expression v = builder.append("child", result.block);
if (offset != null) {
if (offset instanceof RexDynamicParam) {
v = getDynamicExpression((RexDynamicParam) offset);
} else {
// Register with Hoisted Variable here
offsetIndex = variables.registerVariable("offset");
v = builder.append(
"offset",
Expressions.call(
v,
BuiltInMethod.SKIP.method,
// At runtime, fetch the bound variable. This returns the Java code to do
that.
EnumerableRel.lookupValue(offsetIndex, Integer.class)));
}
}
if (fetch != null) {
if (fetch instanceof RexDynamicParam) {
v = getDynamicExpression((RexDynamicParam) fetch);
} else {
// Register with Hoisted Variable here
this.fetchIndex = variables.registerVariable("fetch");
v = builder.append(
"fetch",
Expressions.call(
v,
BuiltInMethod.TAKE.method,
// At runtime, fetch the bound variable. This returns the Java code to do
that.
EnumerableRel.lookupValue(fetchIndex, Integer.class)));
}
}
{code}
The second phase of the query execution is where registered {{Slots}} get
bound. To this, our change adds a new optional method to {{Bindable}} called
{{hoistVariables}}. This method is where an instance of {{EnumerableRel}}
extracts the values out of the query plan and binds them into the
{{HoistedVariables}} instance just prior to executing the query. Below is
{{EnumerableLimit}} implementation:
{code:java}
@Override public void hoistedVariables(HoistedVariables variables) {
getInputs()
.stream()
.forEach(rel -> {
final EnumerableRel enumerable = (EnumerableRel) rel;
enumerable.hoistedVariables(variables);
});
if (fetchIndex != null) {
// fetchIndex is the registered slot for this variable. Bind fetchIndex to
fetch
variables.setVariable(fetchIndex, RexLiteral.intValue(fetch));
}
if (offsetIndex != null) {
// offsetIndex is the registered slot for this variable. Bind offsetIndex
to offset.
variables.setVariable(offsetIndex, RexLiteral.intValue(offset));
}
}
{code}
To tie these three phases together, {{CalcitePrepareImpl}} needs to setup the
variables when it creates a {{PreparedResult}}:
{code:java}
try {
CatalogReader.THREAD_LOCAL.set(catalogReader);
final SqlConformance conformance = context.config().conformance();
internalParameters.put("_conformance", conformance);
// Get the compiled Bindable instance either from cache or generate a new one.
bindable = EnumerableInterpretable.toBindable(internalParameters,
context.spark(), enumerable, prefer, variables);
// Bind any hoisted variables used in the Bindable.
enumerable.hoistedVariables(variables);
} finally {
CatalogReader.THREAD_LOCAL.remove();
}
{code}
These hoisted variables are then passed into the {{CalciteSignature}} which
finally calls {{Bindable.bind(dataContext, variables}}.
{code:java}
public Enumerable<T> enumerable(DataContext dataContext) {
Enumerable<T> enumerable = bindable.bind(dataContext, variables);
if (maxRowCount >= 0) {
// Apply limit. In JDBC 0 means "no limit". But for us, -1 means
// "no limit", and 0 is a valid limit.
enumerable = EnumerableDefaults.take(enumerable, maxRowCount);
}
return enumerable;
}
{code}
To review, Code Generation that needs to register an intent to hoist a variable
and then generate code that +fetches+ that hoisted variable and prior to
executing the resulting {{Bindable}}, any registered hoisted variables need to
be bound by calling each {{EnumerablRel}}'s {{hoistVariables}} method.
h1. Future Work
This pull request only includes an update to {{EnumerableLimit}} as it was the
simplest {{EnumerableRel}} to update. Internally at Twilio, we have our own
{{KuduToEnumerableRel}} that was used extensively to test this and the
performance improvements we have seen from this implementation.
> Hoist literals
> --------------
>
> Key: CALCITE-963
> URL: https://issues.apache.org/jira/browse/CALCITE-963
> Project: Calcite
> Issue Type: Bug
> Reporter: Julian Hyde
> Priority: Major
> Labels: pull-request-available
> Attachments: HoistedVariables.png
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Convert literals into (internal) bind variables so that statements that
> differ only in literal values can be executed using the same plan.
> In [mail
> thread|http://mail-archives.apache.org/mod_mbox/calcite-dev/201511.mbox/%[email protected]%3E]
> Homer wrote:
> {quote}Imagine that it is common to run a large number of very similar
> machine generated queries that just change the literals in the sql query.
> For example (the real queries would be much more complex):
> {code}Select * from emp where empno = 1;
> Select * from emp where empno = 2;
> etc.{code}
> The plan that is likely being generated for these kind of queries is going to
> be very much the same each time, so to save some time, I would like to
> recognize that the literals are all that have changed in a query and use the
> previously optimized execution plan and just replace the literals.{quote}
> I think this could be done as a transform on the initial RelNode tree. It
> would find literals (RexLiteral), replace them with bind variables
> (RexDynamicParam) and write the value into a pool. The next statement would
> go through the same process and the RelNode tree would be identical, but with
> possibly different values for the bind variables.
> The bind variables are of course internal; not visible from JDBC. When the
> statement is executed, the bind variables are implicitly bound.
> Statements would be held in a Guava cache.
> This would be enabled by a config parameter. Unfortunately I don't think we
> could do this by default -- we'd lose optimization power because we would no
> longer be able to do constant reduction.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)