vinothchandar commented on a change in pull request #4449:
URL: https://github.com/apache/hudi/pull/4449#discussion_r781704816



##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieStorageConfig.java
##########
@@ -83,6 +84,11 @@
       .withDocumentation("Lower values increase the size of metadata tracked 
within HFile, but can offer potentially "
           + "faster lookup times.");
 
+  public static final ConfigProperty<String> HFILE_SCHEMA_KEY_FIELD_ID = 
ConfigProperty
+      .key("hoodie.hfile.schema.key.field.id")

Review comment:
       rename: hoodie.hfile.key.field.name (drop id, since its just a name in 
reality) 

##########
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieHFileDataBlock.java
##########
@@ -122,9 +118,9 @@ public HoodieLogBlockType getBlockType() {
       if (useIntegerKey) {
         recordKey = String.format("%" + keySize + "s", key++);
       } else {
-        recordKey = record.get(keyField.pos()).toString();
+        recordKey = record.get(schemaKeyField.pos()).toString();

Review comment:
       why not just look up by name? Don't understand this need for positional 
lookups to begin with?

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieStorageConfig.java
##########
@@ -83,6 +84,11 @@
       .withDocumentation("Lower values increase the size of metadata tracked 
within HFile, but can offer potentially "
           + "faster lookup times.");
 
+  public static final ConfigProperty<String> HFILE_SCHEMA_KEY_FIELD_ID = 
ConfigProperty
+      .key("hoodie.hfile.schema.key.field.id")
+      .defaultValue(HoodieMetadataPayload.SCHEMA_FIELD_ID_KEY)

Review comment:
       rename consistently everywhere else

##########
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieHFileDataBlock.java
##########
@@ -110,8 +106,8 @@ public HoodieLogBlockType getBlockType() {
     boolean useIntegerKey = false;
     int key = 0;
     int keySize = 0;
-    Field keyField = records.get(0).getSchema().getField(this.keyField);
-    if (keyField == null) {
+    final Field schemaKeyField = 
records.get(0).getSchema().getField(HoodieMetadataPayload.SCHEMA_FIELD_ID_KEY);

Review comment:
       should nt this use the value passed in from storage config? We cannot 
have this hardcoded here

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieStorageConfig.java
##########
@@ -83,6 +84,11 @@
       .withDocumentation("Lower values increase the size of metadata tracked 
within HFile, but can offer potentially "
           + "faster lookup times.");
 
+  public static final ConfigProperty<String> HFILE_SCHEMA_KEY_FIELD_ID = 
ConfigProperty
+      .key("hoodie.hfile.schema.key.field.id")
+      .defaultValue(HoodieMetadataPayload.SCHEMA_FIELD_ID_KEY)
+      .withDocumentation("Key field name for the HFile schema. This key field 
is used for on-disk storage optimization");

Review comment:
       explain what purpose this serves i.e reduce storage overhead in not 
storing keys with values.

##########
File path: 
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileReader.java
##########
@@ -62,6 +64,7 @@
   // Scanner used to read individual keys. This is cached to prevent the 
overhead of opening the scanner for each
   // key retrieval.
   private HFileScanner keyScanner;
+  private final String keyField = HoodieMetadataPayload.SCHEMA_FIELD_ID_KEY;

Review comment:
       We could even stick this into the config object. But cannot do this 
hardcoding or referencing of metadata payload here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to