clintropolis opened a new pull request, #19262:
URL: https://github.com/apache/druid/pull/19262
### Description
This PR fixes an issue when clustering MSQ insert/replace by a virtual
column that depends on another virtual column e.g. something like
`LOWER(JSON_VALUE(obj, '$.a'))` which plans to an `ExpressionVirtualColumn`
referencing a `NestedFieldVirtualColumn`, the resulting
`DimensionRangeShardSpec was missing the dependent virtual columns. This broke
segment pruning for those queries since the shard spec had incomplete virtual
column context, and compaction for the same reason.
Also, it fixes an issue with virtual column equivalence for the same case,
where a virtual column depends on another virtual column, by allowing virtual
columns with equivalent virtual column dependencies to be considered equivalent
by rewriting the virtual column to use the equivalent inputs before testing.
changes:
* adds `addRequiredVirtualColumns` method to `SegmentGenerationStageSpec`
which resolves transitive virtual column dependencies for virtual columns used
by clustering, fixing a bug where these dependent virtual columns would be lost
in the shard spec and compaction state
* adds `supportsRequiredRewrite` and `rewriteRequiredColumns` to
`VirtualColumn` allowing a virtual column to rewrite its input references to
equivalent names
* adds `Expr.rewriteBindings` to rewrite identifier bindings in an `Expr`
tree
* `VirtualColumns.findEquivalent` is enhanced to transitively resolve
dependent virtual columns across naming contexts before checking equivalence,
enabling detection that e.g. `lower("v1")` ≡ `lower("v0")` when v0 and v1 are
equivalent virtual columns
* `FilterSegmentPruner` updated to use transitive equivalence when matching
shard virtual columns to query virtual columns (with Optional-based caching to
correctly handle nulls)
* `Projections.matchQueryVirtualColumn` updated similarly
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]