clintropolis commented on code in PR #19061:
URL: https://github.com/apache/druid/pull/19061#discussion_r2893141610


##########
multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java:
##########
@@ -1824,9 +1832,20 @@ private static Function<Set<DataSegment>, 
Set<DataSegment>> addCompactionStateTo
     );
 
     DimensionsSpec dimensionsSpec = dataSchema.getDimensionsSpec();
-    CompactionTransformSpec transformSpec = 
TransformSpec.NONE.equals(dataSchema.getTransformSpec())
-                                            ? null
-                                            : 
CompactionTransformSpec.of(dataSchema.getTransformSpec());
+
+    // if the clustered by requires virtual columns, preserve them here so 
that we can rebuild during compaction
+    CompactionTransformSpec transformSpec;
+    if (clusterBy == null || clusterBy.getVirtualColumnMap().isEmpty()) {
+      transformSpec = TransformSpec.NONE.equals(dataSchema.getTransformSpec())
+                      ? null
+                      : 
CompactionTransformSpec.of(dataSchema.getTransformSpec());
+    } else {
+      transformSpec = new CompactionTransformSpec(
+          dataSchema.getTransformSpec().getFilter(),
+          VirtualColumns.create(clusterBy.getVirtualColumnMap().values())

Review Comment:
   >Won't adding the virtual columns to the transformSpec make them become real 
columns? I don't think that's what we want.
   
   I don't think so, depending on what you mean by real columns. Virtual 
columns being here on compaction transform config is a new MSQ compaction only 
thing that was added a couple of weeks ago for reindexing templates to support 
filters on virtual columns (for the deletion rules). Native compaction explodes 
if they exist.
   
   I would agree that it is kind of an odd and confusing place to define 
virtual columns available for MSQ compaction, especially since there is no such 
thing on the actual `TransformSpec`. They were added they were added there in 
that PR i believe because there wasn't really a better existing place on the 
compaction config.
   
   The virtual columns here are added as part of building the synthetic virtual 
columns like time_floor and mv_to_array stuff, which are then passed into the 
build query methods which can add them as appropriate depending on how they are 
used.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to