gianm commented on code in PR #19061:
URL: https://github.com/apache/druid/pull/19061#discussion_r2893054810
##########
multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java:
##########
@@ -1824,9 +1832,20 @@ private static Function<Set<DataSegment>,
Set<DataSegment>> addCompactionStateTo
);
DimensionsSpec dimensionsSpec = dataSchema.getDimensionsSpec();
- CompactionTransformSpec transformSpec =
TransformSpec.NONE.equals(dataSchema.getTransformSpec())
- ? null
- :
CompactionTransformSpec.of(dataSchema.getTransformSpec());
+
+ // if the clustered by requires virtual columns, preserve them here so
that we can rebuild during compaction
+ CompactionTransformSpec transformSpec;
+ if (clusterBy == null || clusterBy.getVirtualColumnMap().isEmpty()) {
+ transformSpec = TransformSpec.NONE.equals(dataSchema.getTransformSpec())
+ ? null
+ :
CompactionTransformSpec.of(dataSchema.getTransformSpec());
+ } else {
+ transformSpec = new CompactionTransformSpec(
+ dataSchema.getTransformSpec().getFilter(),
+ VirtualColumns.create(clusterBy.getVirtualColumnMap().values())
Review Comment:
Won't adding the virtual columns to the `transformSpec` make them become
real columns? I don't think that's what we want.
##########
processing/src/main/java/org/apache/druid/timeline/partition/DimensionRangeShardSpec.java:
##########
@@ -53,13 +55,14 @@ public class DimensionRangeShardSpec extends
BaseDimensionRangeShardSpec
@JsonCreator
public DimensionRangeShardSpec(
@JsonProperty("dimensions") List<String> dimensions,
+ @JsonProperty("virtualColumns") @Nullable VirtualColumns virtualColumns,
Review Comment:
Are there going to be issues with deserializing virtual columns on server
types that haven't had to deal with them before (like the Coordinator)? I
wonder if all expressions are registered there or if some modules have more
narrow scopes.
##########
processing/src/main/java/org/apache/druid/frame/key/ClusterBy.java:
##########
@@ -45,16 +48,27 @@
public class ClusterBy
{
private final List<KeyColumn> columns;
+ private final Map<String, VirtualColumn> virtualColumnMap;
private final int bucketByCount;
private final boolean sortable;
+ public ClusterBy(
+ List<KeyColumn> keyColumns,
+ int bucketByCount
+ )
+ {
+ this(keyColumns, Map.of(), bucketByCount);
+ }
+
@JsonCreator
public ClusterBy(
@JsonProperty("columns") List<KeyColumn> columns,
+ @JsonProperty("virtualColumnMap") @Nullable Map<String, VirtualColumn>
virtualColumnMap,
Review Comment:
Why does this need to be on the `clusterBy`? It seems to me like the wrong
place to put it, since `clusterBy` is an MSQ framework concept and virtual
columns are an ingestion & query concept.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]