gortiz opened a new pull request, #12401:
URL: https://github.com/apache/pinot/pull/12401
Tags: bugfix
## The problem
In V2, expressions like `cast('stringLiteral' as BINARY)` were incorrectly
executed. This also includes common patterns like `bytesColumn = 'some hex
literal'`.
For example, using table `starbucksStores` from `GenericQuickstart`, the
following query failed in V2 while worked in V1:
```sql
select location_st_point from starbucksStores where location_st_point =
'80c062bf21bba70a9b404e969de0503750' limit 10
```
### Low level description
The exception we got was:
```
Caused by: java.lang.AssertionError: value 00ba86d4ebda4d5f8da9ba2cf3053208
does not match type class org.apache.calcite.avatica.util.ByteString
at
org.apache.calcite.linq4j.tree.ConstantExpression.<init>(ConstantExpression.java:51)
at org.apache.calcite.linq4j.tree.Expressions.constant(Expressions.java:576)
at
org.apache.calcite.linq4j.tree.OptimizeShuttle.visit(OptimizeShuttle.java:291)
at
org.apache.calcite.linq4j.tree.UnaryExpression.accept(UnaryExpression.java:39))
```
When the query is received, the expected algorithm is:
1. Calcite parsed the SQL
2. Calcite transformed the SQL to Relational Algrebra (`SqlNode` to
`RelRoot`). In order to do so, an implicit call from String (`VARCHAR`) to
Bytes (`VARBINARY`) is created.
3. We call `sqlToRelConverter.trimUnusedFields` to simplify the query before
optimize.
4. Optimize. Here the CAST should be executed calling `hexToBinary`.
What actually happened is that in 3. (`sqlToRelConverter.trimUnusedFields`)
the expression `CAST(stringLiteral as VARBINARY)` was simplified by Calcite.
There is a known bug in Calcite that fails in this kind of casts. There is a
patch to solve if (see https://github.com/apache/calcite/pull/3635) but at the
time of writing this is not what we want. Specifically, the patch plans to
apply some custom semantic (compatible with MySQL and Postgres) on that
conversion. That semantic is valid, but it is not what we do in V1. Therefore
even if the bug is fixed in Calcite it won't behave as V1 users would expect
and as a result they could find unexpected results when migrating queries from
V1 to V2.
## The fix
In order to fix the problem, a couple of new transformation rules has been
introduced. These rules look for calls to `cast(somethingOfTypeString as
VARBINARY)` and transform that to `hexToBytes(somethingOfTypeString)`. These
rules have to be applied before `sqlToRelConverter.trimUnusedFields` is called.
Given that method needs to be called before the current optimizers are used, we
need to add a new optimizer for these rules.
### Details
Right now these rules are only applied on filter, projection and joins. We
may need to add them to other `Rel`s in the future.
cc @walterddr @Jackie-Jiang @xiangfu0
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]