gortiz opened a new pull request, #12401:
URL: https://github.com/apache/pinot/pull/12401

   Tags: bugfix
   
   ## The problem
   
   In V2, expressions like `cast('stringLiteral' as BINARY)` were incorrectly 
executed. This also includes common patterns like `bytesColumn = 'some hex 
literal'`.
   
   For example, using table `starbucksStores` from `GenericQuickstart`, the 
following query failed in V2 while worked in V1:
   ```sql
   select location_st_point from starbucksStores where location_st_point = 
'80c062bf21bba70a9b404e969de0503750' limit 10
   ```
   
   ### Low level description
   
   The exception we got was:
   
   ```
   Caused by: java.lang.AssertionError: value 00ba86d4ebda4d5f8da9ba2cf3053208 
does not match type class org.apache.calcite.avatica.util.ByteString
    at 
org.apache.calcite.linq4j.tree.ConstantExpression.<init>(ConstantExpression.java:51)
    at org.apache.calcite.linq4j.tree.Expressions.constant(Expressions.java:576)
    at 
org.apache.calcite.linq4j.tree.OptimizeShuttle.visit(OptimizeShuttle.java:291)
    at 
org.apache.calcite.linq4j.tree.UnaryExpression.accept(UnaryExpression.java:39))
   ```
   
   When the query is received, the expected algorithm is:
   1. Calcite parsed the SQL
   2. Calcite transformed the SQL to Relational Algrebra (`SqlNode` to 
`RelRoot`). In order to do so, an implicit call from String (`VARCHAR`) to 
Bytes (`VARBINARY`) is created.
   3. We call `sqlToRelConverter.trimUnusedFields` to simplify the query before 
optimize.
   4. Optimize. Here the CAST should be executed calling `hexToBinary`.
   
   What actually happened is that in 3. (`sqlToRelConverter.trimUnusedFields`) 
the expression `CAST(stringLiteral as VARBINARY)` was simplified by Calcite. 
There is a known bug in Calcite that fails in this kind of casts. There is a 
patch to solve if (see https://github.com/apache/calcite/pull/3635) but at the 
time of writing this is not what we want. Specifically, the patch plans to 
apply some custom semantic (compatible with MySQL and Postgres) on that 
conversion. That semantic is valid, but it is not what we do in V1. Therefore 
even if the bug is fixed in Calcite it won't behave as V1 users would expect 
and as a result they could find unexpected results when migrating queries from 
V1 to V2.
   
   ## The fix
   
   In order to fix the problem, a couple of new transformation rules has been 
introduced. These rules look for calls to `cast(somethingOfTypeString as 
VARBINARY)` and transform that to `hexToBytes(somethingOfTypeString)`. These 
rules have to be applied before `sqlToRelConverter.trimUnusedFields` is called. 
Given that method needs to be called before the current optimizers are used, we 
need to add a new optimizer for these rules.
   
   ### Details
   
   Right now these rules are only applied on filter, projection and joins. We 
may need to add them to other `Rel`s in the future.
   
   cc @walterddr @Jackie-Jiang @xiangfu0 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to