FrankChen021 commented on code in PR #19599:
URL: https://github.com/apache/druid/pull/19599#discussion_r3442413260


##########
processing/src/main/java/org/apache/druid/segment/projections/ClusteringVectorColumnSelectorFactory.java:
##########
@@ -181,7 +181,21 @@ public ColumnCapabilities getColumnCapabilities(String 
column)
   {
     final int idx = clusteringColumns.indexOf(column);
     if (idx < 0) {
-      return delegate.getColumnCapabilities(column);
+      // Non-clustering columns are stored per cluster group with per-group 
local dictionaries that are NOT stable
+      // across the concatenating vector cursor. We must not advertise 
dictionary encoding: the vectorized group-by
+      // keys on the selector's (per-group-local) IDs when the column reports 
as dictionary-encoded
+      // (GroupByVectorColumnProcessorFactory#useDictionaryEncodedSelector), 
conflating distinct values across groups.
+      // Reporting non-dictionary-encoded routes it to the value-building 
vector selector, which is correct across
+      // groups.
+      final ColumnCapabilities delegateCapabilities = 
delegate.getColumnCapabilities(column);
+      if (delegateCapabilities == null) {
+        return null;
+      }
+      return ColumnCapabilitiesImpl.copyOf(delegateCapabilities)
+                                   .setDictionaryEncoded(false)

Review Comment:
   [P2] Hide vector selector dictionaries too
   
   This hides per-group dictionaries through ColumnCapabilities, but the 
delegating vector dimension selectors still return the current group's 
cardinality/name-lookup/idLookup. Any vectorized caller that directly uses 
makeSingleValueDimensionSelector or makeMultiValueDimensionSelector and caches 
ids across ConcatenatingVectorCursor group transitions can reproduce the same 
wrong-result class this PR fixes for scalar grouping. Please mirror the scalar 
DelegatingDimensionSelector behavior for vector dimension selectors by 
returning CARDINALITY_UNKNOWN, false, and null for non-clustering columns, with 
a vectorized multi-group clustered test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to