gianm commented on code in PR #19061:
URL: https://github.com/apache/druid/pull/19061#discussion_r2893054810


##########
multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java:
##########
@@ -1824,9 +1832,20 @@ private static Function<Set<DataSegment>, 
Set<DataSegment>> addCompactionStateTo
     );
 
     DimensionsSpec dimensionsSpec = dataSchema.getDimensionsSpec();
-    CompactionTransformSpec transformSpec = 
TransformSpec.NONE.equals(dataSchema.getTransformSpec())
-                                            ? null
-                                            : 
CompactionTransformSpec.of(dataSchema.getTransformSpec());
+
+    // if the clustered by requires virtual columns, preserve them here so 
that we can rebuild during compaction
+    CompactionTransformSpec transformSpec;
+    if (clusterBy == null || clusterBy.getVirtualColumnMap().isEmpty()) {
+      transformSpec = TransformSpec.NONE.equals(dataSchema.getTransformSpec())
+                      ? null
+                      : 
CompactionTransformSpec.of(dataSchema.getTransformSpec());
+    } else {
+      transformSpec = new CompactionTransformSpec(
+          dataSchema.getTransformSpec().getFilter(),
+          VirtualColumns.create(clusterBy.getVirtualColumnMap().values())

Review Comment:
   Won't adding the virtual columns to the `transformSpec` make them become 
real columns? I don't think that's what we want.



##########
processing/src/main/java/org/apache/druid/timeline/partition/DimensionRangeShardSpec.java:
##########
@@ -53,13 +55,14 @@ public class DimensionRangeShardSpec extends 
BaseDimensionRangeShardSpec
   @JsonCreator
   public DimensionRangeShardSpec(
       @JsonProperty("dimensions") List<String> dimensions,
+      @JsonProperty("virtualColumns") @Nullable VirtualColumns virtualColumns,

Review Comment:
   Are there going to be issues with deserializing virtual columns on server 
types that haven't had to deal with them before (like the Coordinator)? I 
wonder if all expressions are registered there or if some modules have more 
narrow scopes.



##########
processing/src/main/java/org/apache/druid/frame/key/ClusterBy.java:
##########
@@ -45,16 +48,27 @@
 public class ClusterBy
 {
   private final List<KeyColumn> columns;
+  private final Map<String, VirtualColumn> virtualColumnMap;
   private final int bucketByCount;
   private final boolean sortable;
 
+  public ClusterBy(
+      List<KeyColumn> keyColumns,
+      int bucketByCount
+  )
+  {
+    this(keyColumns, Map.of(), bucketByCount);
+  }
+
   @JsonCreator
   public ClusterBy(
       @JsonProperty("columns") List<KeyColumn> columns,
+      @JsonProperty("virtualColumnMap") @Nullable Map<String, VirtualColumn> 
virtualColumnMap,

Review Comment:
   Why does this need to be on the `clusterBy`? It seems to me like the wrong 
place to put it, since `clusterBy` is an MSQ framework concept and virtual 
columns are an ingestion & query concept.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to