yihua commented on code in PR #13675:
URL: https://github.com/apache/hudi/pull/13675#discussion_r2274768726
##########
hudi-common/src/main/java/org/apache/hudi/metadata/ColumnStatsIndexPrefixRawKey.java:
##########
@@ -25,40 +25,64 @@
import java.util.Objects;
/**
- * Represents a raw key for column stats index consisting of column name and
optional partition name.
+ * Represents a raw key prefix for column stats index consisting of column
name and optional partition name.
+ * <p>
+ * This key is used for prefix lookups in the COLUMN_STATS partition to find
all stats
+ * for a specific column across files, or for a column within a specific
partition.
+ * This enables efficient retrieval of column statistics for query planning.
+ * <p>
+ * Raw key format:
+ * - Column-only lookup: base64(column_name)
+ * - Column + partition lookup: base64(column_name) +
base64(partition_identifier)
+ * <p>
+ * Examples:
+ * - To find all stats for column "price" across all partitions and files:
+ * key = base64("price")
+ * Example encoded: "cHJpY2U="
+ * <p>
+ * - To find all stats for column "user_id" in partition "2023/01/15":
+ * key = base64("user_id") + base64("2023/01/15")
+ * Example encoded: "dXNlcl9pZA==" + "MjAyMy8wMS8xNQ=="
+ * <p>
+ * - To find all stats for column "revenue" in non-partitioned table:
+ * key = base64("revenue") + base64("__HIVE_DEFAULT_PARTITION__")
Review Comment:
Non-partitioned table does not use `__HIVE_DEFAULT_PARTITION__`. Removing
this for now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]