clintropolis opened a new pull request, #19262:
URL: https://github.com/apache/druid/pull/19262

   ### Description
   This PR fixes an issue when clustering MSQ insert/replace by a virtual 
column that depends on another virtual column e.g. something like 
`LOWER(JSON_VALUE(obj, '$.a'))` which plans to an `ExpressionVirtualColumn` 
referencing a `NestedFieldVirtualColumn`, the resulting 
`DimensionRangeShardSpec was missing the dependent virtual columns. This broke 
segment pruning for those queries since the shard spec had incomplete virtual 
column context, and compaction for the same reason.
   
   Also, it fixes an issue with virtual column equivalence for the same case, 
where a virtual column depends on another virtual column, by allowing virtual 
columns with equivalent virtual column dependencies to be considered equivalent 
by rewriting the virtual column to use the equivalent inputs before testing.
   
   changes:
   * adds `addRequiredVirtualColumns` method to `SegmentGenerationStageSpec` 
which resolves transitive virtual column dependencies for virtual columns used 
by clustering, fixing a bug where these dependent virtual columns would be lost 
in the shard spec and compaction state
   * adds `supportsRequiredRewrite` and `rewriteRequiredColumns` to 
`VirtualColumn` allowing a virtual column to rewrite its input references to 
equivalent names
   * adds `Expr.rewriteBindings` to rewrite identifier bindings in an `Expr` 
tree
   * `VirtualColumns.findEquivalent` is enhanced to transitively resolve 
dependent virtual columns across naming contexts before checking equivalence, 
enabling detection that e.g. `lower("v1")` ≡ `lower("v0")` when v0 and v1 are 
equivalent virtual columns
   * `FilterSegmentPruner` updated to use transitive equivalence when matching 
shard virtual columns to query virtual columns (with Optional-based caching to 
correctly handle nulls)
   * `Projections.matchQueryVirtualColumn` updated similarly


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to