nsivabalan commented on code in PR #12350:
URL: https://github.com/apache/hudi/pull/12350#discussion_r1862585218
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -2514,6 +2532,27 @@ public static HoodieData<HoodieRecord>
convertMetadataToPartitionStatsRecords(Ho
}
}
+ /**
+ * Given table schema and field to index, checks if field's data type are
supported.
+ *
+ * @param columnToIndex column to index
+ * @param tableSchema table schema
+ * @return true if field's data type is supported, false otherwise
+ */
+ public static boolean validateDataTypeForPartitionStats(String
columnToIndex, Schema tableSchema) {
+ Schema fieldSchema = getNestedFieldSchemaFromWriteSchema(tableSchema,
columnToIndex);
+ // Exclude fields based on logical type
+ if ((fieldSchema.getType() == Schema.Type.INT || fieldSchema.getType() ==
Schema.Type.LONG)
+ && fieldSchema.getLogicalType() != null) {
+
+ // Skip fields with logical types DATE or TIME_MILLIS for INT,
TIMESTAMP_MILLIS for LONG
Review Comment:
how did you arrive at this?
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -2514,6 +2532,27 @@ public static HoodieData<HoodieRecord>
convertMetadataToPartitionStatsRecords(Ho
}
}
+ /**
+ * Given table schema and field to index, checks if field's data type are
supported.
+ *
+ * @param columnToIndex column to index
+ * @param tableSchema table schema
+ * @return true if field's data type is supported, false otherwise
+ */
+ public static boolean validateDataTypeForPartitionStats(String
columnToIndex, Schema tableSchema) {
Review Comment:
lets add VisibleForTesting annotation. this method does not need to be public
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -2389,7 +2395,12 @@ public static HoodieData<HoodieRecord>
convertFilesToPartitionStatsRecords(Hoodi
LOG.warn("No columns to index for partition stats index");
return engineContext.emptyHoodieData();
}
- LOG.debug("Indexing following columns for partition stats index: {}",
columnsToIndex);
+ // filter columns with only supported types
+ final List<String> validColumnsToIndex = columnsToIndex.stream()
Review Comment:
So, we should include them as part of partition stats as well
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]