clintropolis commented on code in PR #18403:
URL: https://github.com/apache/druid/pull/18403#discussion_r2277941363


##########
processing/src/main/java/org/apache/druid/segment/AggregateProjectionMetadata.java:
##########
@@ -213,16 +214,19 @@ public Schema(
 
       int foundTimePosition = -1;
       this.orderingWithTimeSubstitution = 
Lists.newArrayListWithCapacity(ordering.size());
-      Granularity granularity = null;
+      Granularity granularity = Granularities.ALL;
       for (int i = 0; i < ordering.size(); i++) {
         OrderBy orderBy = ordering.get(i);
         if (orderBy.getColumnName().equals(timeColumnName)) {
           orderingWithTimeSubstitution.add(new 
OrderBy(ColumnHolder.TIME_COLUMN_NAME, orderBy.getOrder()));
+          if (foundTimePosition != -1) {
+            throw DruidException.defensive("projection[%s] has multiple time 
columns in ordering: %s", name, ordering);
+          }
           foundTimePosition = i;
           timeColumnName = groupingColumns.get(foundTimePosition);
           final VirtualColumn vc = 
this.virtualColumns.getVirtualColumn(groupingColumns.get(foundTimePosition));
           if (vc != null) {
-            granularity = Granularities.fromVirtualColumn(vc);
+            granularity = 
MoreObjects.firstNonNull(Granularities.fromVirtualColumn(vc), 
Granularities.ALL);

Review Comment:
   this seems less clear to me than just initializing to null and letting it 
naturally fill in like it was doing previously



##########
processing/src/main/java/org/apache/druid/java/util/common/granularity/Granularities.java:
##########
@@ -194,14 +192,24 @@ public static ExpressionVirtualColumn 
toVirtualColumn(Granularity granularity, S
   public static Granularity fromVirtualColumn(VirtualColumn virtualColumn)
   {
     if (virtualColumn instanceof ExpressionVirtualColumn) {
-      final ExpressionVirtualColumn expressionVirtualColumn = 
(ExpressionVirtualColumn) virtualColumn;
-      final Expr expr = expressionVirtualColumn.getParsedExpression().get();
-      if (expr instanceof TimestampFloorExprMacro.TimestampFloorExpr) {
-        final TimestampFloorExprMacro.TimestampFloorExpr gran = 
(TimestampFloorExprMacro.TimestampFloorExpr) expr;
-        if (gran.getArg().getBindingIfIdentifier() != null) {
-          return gran.getGranularity();
-        }
-      }
+      return fromExpr(((ExpressionVirtualColumn) 
virtualColumn).getParsedExpression().get());
+    }
+    return null;
+  }
+
+  @Nullable
+  private static Granularity fromExpr(Expr expr)
+  {
+    String identifier = expr.getIdentifierIfIdentifier();

Review Comment:
   wny switch to `getIdentifierIfIdentifier`? It doesn't matter really here 
probably, but `getBindingIfIdentifier` is more correct since the binding refers 
to the underlying column input instead of the variable name in the expression



##########
processing/src/main/java/org/apache/druid/java/util/common/granularity/PeriodGranularity.java:
##########
@@ -216,6 +226,139 @@ public String toString()
            '}';
   }
 
+  /**
+   * Returns true if this granularity can be mapped to the target granularity. 
For example:
+   * <li>Period('PT1H') in UTC can be mapped to Period('P1D') in UTC</li>
+   * <li>Period('PT1H') in America/Los_Angeles can be mapped to Period('PT1H') 
in UTC</li>
+   * <li>Period('P1D') in America/Los_Angeles cannot be mapped to 
Period('P1D') in UTC</li>
+   */
+  public boolean canBeMappedTo(PeriodGranularity target)

Review Comment:
   admittedly i'm still digesting how exactly this method works, but it seems 
kind of expensive to do to match for every projection we consider of every 
segment when the timezones don't match (at least 
`getUtcMappablePeriodSecondsOrThrow` seems expensive). Perhaps we should make a 
cache of these conversions so we can re-use the work we've done since its 
likely going to be a lot of the same checks over and over?



##########
processing/src/main/java/org/apache/druid/java/util/common/granularity/PeriodGranularity.java:
##########
@@ -216,6 +226,139 @@ public String toString()
            '}';
   }
 
+  /**
+   * Returns true if this granularity can be mapped to the target granularity. 
For example:
+   * <li>Period('PT1H') in UTC can be mapped to Period('P1D') in UTC</li>
+   * <li>Period('PT1H') in America/Los_Angeles can be mapped to Period('PT1H') 
in UTC</li>
+   * <li>Period('P1D') in America/Los_Angeles cannot be mapped to 
Period('P1D') in UTC</li>
+   */
+  public boolean canBeMappedTo(PeriodGranularity target)
+  {
+    if (hasOrigin || target.hasOrigin) {
+      return false;
+    }
+
+    if (getTimeZone().equals(target.getTimeZone())) {
+      int periodMonths = period.getYears() * 12 + period.getMonths();
+      int targetMonths = target.period.getYears() * 12 + 
target.period.getMonths();
+      if (targetMonths == 0 && periodMonths != 0) {
+        return false;
+      }
+
+      long periodStandardSeconds = 
getStandardSecondsOrThrow(period.withYears(0).withMonths(0));
+      long targetStandardSeconds = 
getStandardSecondsOrThrow(target.period.withYears(0).withMonths(0));
+      if (targetMonths == 0 && periodMonths == 0) {
+        // both periods have zero months
+        return targetStandardSeconds % periodStandardSeconds == 0;
+      }
+      // if we reach here, targetMonths != 0
+      if (periodMonths == 0) {
+        // can map if 1.target not have week/day/hour/minute/second, and 
2.period can be mapped to day
+        // this is for simplicity, it's possible some period can be mapped to 
gran, but we don't support it
+        return targetStandardSeconds == 0 && (3600 * 24) % 
periodStandardSeconds == 0;
+      } else {
+        // can map if 1.target&period not have week/day/hour/minute/second, 
and 2.period month can be mapped to target month
+        return targetMonths % periodMonths == 0
+               && targetStandardSeconds == 0
+               && periodStandardSeconds == 0;
+      }
+    }
+
+    // Timezone doesn't match, we'd require periods to be in whole seconds, 
i.e. no years, months, or milliseconds.
+    if (getStandardSeconds(period).isEmpty()) {
+      return false;
+    }
+    long standardSeconds = getStandardSecondsOrThrow(period);
+    if (standardSeconds != getUtcMappablePeriodSecondsOrThrow()) {
+      // the period cannot be mapped to UTC with the same period
+      return false;
+    }
+    if (target.period.getYears() == 0 && target.period.getMonths() == 0) {
+      return target.getUtcMappablePeriodSecondsOrThrow() % standardSeconds == 
0;
+    } else {
+      return 
getStandardSecondsOrThrow(target.period.withYears(0).withMonths(0)) == 0
+             && (3600 * 24) % standardSeconds == 0;
+    }
+  }
+
+  /**
+   * Returns the maximum possible period seconds that this granularity can be 
mapped to UTC.
+   * <p>
+   * Throws {@link DruidException} if the period cannot be mapped to whole 
seconds, i.e. it has years or months, or milliseconds.
+   */
+  public long getUtcMappablePeriodSecondsOrThrow()
+  {
+    long periodSeconds = PeriodGranularity.getStandardSecondsOrThrow(period);
+
+    if (ISOChronology.getInstanceUTC().getZone().equals(getTimeZone())) {
+      return periodSeconds;
+    }
+    ZoneRules rules = ZoneId.of(getTimeZone().getID()).getRules();
+    Set<Integer> offsets = Stream.concat(
+        Stream.of(rules.getStandardOffset(Instant.now())),
+        rules.getTransitions()
+             .stream()
+             .filter(t -> t.getInstant().isAfter(Instant.EPOCH)) // timezone 
transitions before epoch are patchy
+             .map(ZoneOffsetTransition::getOffsetBefore)
+    ).map(ZoneOffset::getTotalSeconds).collect(Collectors.toSet());
+
+    if (offsets.isEmpty()) {
+      // no offsets
+      return periodSeconds;
+    }
+
+    if (offsets.stream().allMatch(o -> o % periodSeconds == 0)) {
+      // all offsets are multiples of the period
+      return periodSeconds;
+    } else if (periodSeconds % 3600 == 0 && offsets.stream().allMatch(o -> o % 
3600 == 0)) {
+      // fall back to hour if period is a multiple of hour and all offsets are 
multiples of hour
+      return 3600;
+    } else if (periodSeconds % 60 == 0 && offsets.stream().allMatch(o -> o % 
60 == 0)) {
+      // fall back to minute if period is a multiple of minute and all offsets 
are multiples of minute
+      return 60;
+    } else {
+      // default to second
+      return 1;
+    }
+  }
+
+  /**
+   * Returns the standard whole seconds for the given period.
+   * <p>
+   * Throws {@link DruidException} if the period cannot be mapped to whole 
seconds, i.e. one of the following applies:
+   * <ul>
+   * <li>it has years or months
+   * <li>it has milliseconds
+   */
+  public static Long getStandardSecondsOrThrow(Period period)
+  {
+    Optional<Long> s = getStandardSeconds(period);
+    if (s.isPresent()) {
+      return s.get();
+    } else {
+      throw DruidException.defensive("Period[%s] cannot be converted to 
standard whole seconds", period);

Review Comment:
   this seems expensive to throw an exception, if there are multiple 
projections to match against a query which has a timezone, this method is going 
to be called for every projection we try to match of every segment



##########
processing/src/main/java/org/apache/druid/java/util/common/granularity/Granularities.java:
##########
@@ -194,14 +192,24 @@ public static ExpressionVirtualColumn 
toVirtualColumn(Granularity granularity, S
   public static Granularity fromVirtualColumn(VirtualColumn virtualColumn)
   {
     if (virtualColumn instanceof ExpressionVirtualColumn) {
-      final ExpressionVirtualColumn expressionVirtualColumn = 
(ExpressionVirtualColumn) virtualColumn;
-      final Expr expr = expressionVirtualColumn.getParsedExpression().get();
-      if (expr instanceof TimestampFloorExprMacro.TimestampFloorExpr) {
-        final TimestampFloorExprMacro.TimestampFloorExpr gran = 
(TimestampFloorExprMacro.TimestampFloorExpr) expr;
-        if (gran.getArg().getBindingIfIdentifier() != null) {
-          return gran.getGranularity();
-        }
-      }
+      return fromExpr(((ExpressionVirtualColumn) 
virtualColumn).getParsedExpression().get());
+    }
+    return null;
+  }
+
+  @Nullable
+  private static Granularity fromExpr(Expr expr)
+  {
+    String identifier = expr.getIdentifierIfIdentifier();
+    if (identifier != null) {
+      return identifier.equals(ColumnHolder.TIME_COLUMN_NAME)
+             ? Granularities.NONE
+             : Granularities.ALL;

Review Comment:
   ALL i guess because this isn't a time grouping?  maybe worth a comment



##########
processing/src/main/java/org/apache/druid/data/input/impl/AggregateProjectionSpec.java:
##########
@@ -230,10 +230,15 @@ private static ProjectionOrdering 
computeOrdering(VirtualColumns virtualColumns,
       } else {
         final VirtualColumn vc = 
virtualColumns.getVirtualColumn(dimension.getName());
         final Granularity maybeGranularity = 
Granularities.fromVirtualColumn(vc);
-        if (granularity == null && maybeGranularity != null) {
+        if (maybeGranularity == null || 
maybeGranularity.equals(Granularities.ALL)) {
+          // no __time in inputs or not supported, skip
+          continue;
+        }
+        if (granularity == null) {
           granularity = maybeGranularity;
           timeColumnName = dimension.getName();
-        } else if (granularity != null && maybeGranularity != null && 
maybeGranularity.isFinerThan(granularity)) {
+        } else if (maybeGranularity.isFinerThan(granularity)) {

Review Comment:
   i think this check needs to make sure there is no timezone on the 
granularity to be considered as the time column since i think __time is assumed 
to be UTC and so any column on the projection being considered as the 
substitute __time column must also be utc. This seems worth adding a test for



##########
processing/src/main/java/org/apache/druid/segment/AggregateProjectionMetadata.java:
##########
@@ -213,16 +214,19 @@ public Schema(
 
       int foundTimePosition = -1;
       this.orderingWithTimeSubstitution = 
Lists.newArrayListWithCapacity(ordering.size());
-      Granularity granularity = null;
+      Granularity granularity = Granularities.ALL;
       for (int i = 0; i < ordering.size(); i++) {
         OrderBy orderBy = ordering.get(i);
         if (orderBy.getColumnName().equals(timeColumnName)) {
           orderingWithTimeSubstitution.add(new 
OrderBy(ColumnHolder.TIME_COLUMN_NAME, orderBy.getOrder()));
+          if (foundTimePosition != -1) {
+            throw DruidException.defensive("projection[%s] has multiple time 
columns in ordering: %s", name, ordering);
+          }

Review Comment:
   I guess this shouldn't be able to happen right now since we generate the 
ordering so the same entry shouldn't ever show up



##########
processing/src/main/java/org/apache/druid/java/util/common/granularity/PeriodGranularity.java:
##########
@@ -216,6 +226,139 @@ public String toString()
            '}';
   }
 
+  /**
+   * Returns true if this granularity can be mapped to the target granularity. 
For example:
+   * <li>Period('PT1H') in UTC can be mapped to Period('P1D') in UTC</li>
+   * <li>Period('PT1H') in America/Los_Angeles can be mapped to Period('PT1H') 
in UTC</li>
+   * <li>Period('P1D') in America/Los_Angeles cannot be mapped to 
Period('P1D') in UTC</li>
+   */
+  public boolean canBeMappedTo(PeriodGranularity target)

Review Comment:
   also these methods feel like they could use some additional comments to make 
it clearer what is going on and why this works



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to