[GitHub] [hudi] vinothchandar commented on a diff in pull request #5771: [HUDI-4071] Relax record key requirement and write with minimal options

GitBox Mon, 25 Jul 2022 13:18:33 -0700


vinothchandar commented on code in PR #5771:
URL: https://github.com/apache/hudi/pull/5771#discussion_r929252278



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala:
##########
@@ -622,9 +623,10 @@ object DataSourceWriteOptions {
   /** @deprecated Use {@link PRECOMBINE_FIELD} and its methods instead */
   @Deprecated
   val PRECOMBINE_FIELD_OPT_KEY = HoodieWriteConfig.PRECOMBINE_FIELD_NAME.key()
-  /** @deprecated Use {@link PRECOMBINE_FIELD} and its methods instead */
+  /** @deprecated Use {@link PRECOMBINE_FIELD} and its methods instead.
+   *             This field has no default value since version 0.12.0, `ts` is 
for backward compatibility. */
   @Deprecated
-  val DEFAULT_PRECOMBINE_FIELD_OPT_VAL = PRECOMBINE_FIELD.defaultValue()
+  val DEFAULT_PRECOMBINE_FIELD_OPT_VAL = "ts"

Review Comment:
   should we even kill the defaults defined here.



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -118,6 +118,15 @@ object HoodieSparkSqlWriter {
 
       operation = WriteOperationType.INSERT
     }
+    // If no record key field and precombine field is provided, assume it's an 
append-only workload and do bulk insert
+    if (!hoodieConfig.contains(RECORDKEY_FIELD) && 
!hoodieConfig.contains(PRECOMBINE_FIELD) &&
+      operation == WriteOperationType.UPSERT) {

Review Comment:
   Also handle the other incremental write operations like INSERT and so on? we 
need to think through all Write operations



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/SimpleAvroKeyGenerator.java:
##########
@@ -47,6 +50,9 @@ public SimpleAvroKeyGenerator(TypedProperties props) {
 
   @Override
   public String getRecordKey(GenericRecord record) {
+    if (recordKeyFields.isEmpty()) {

Review Comment:
   lets make sure all Key gen code paths are taken care of, including the Spark 
Row based key gen



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -118,6 +118,15 @@ object HoodieSparkSqlWriter {
 
       operation = WriteOperationType.INSERT
     }
+    // If no record key field and precombine field is provided, assume it's an 
append-only workload and do bulk insert
+    if (!hoodieConfig.contains(RECORDKEY_FIELD) && 
!hoodieConfig.contains(PRECOMBINE_FIELD) &&
+      operation == WriteOperationType.UPSERT) {
+
+      log.warn(s"$RECORDKEY_FIELD and $PRECOMBINE_FIELD is not specified " +
+        s"overriding the $OPERATION to be $BULK_INSERT_OPERATION_OPT_VAL")
+
+      operation = WriteOperationType.BULK_INSERT

Review Comment:
   but we still populate meta fields? So for users doing this sort of 
benchmarking out of box, Hudi will appear slower?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] vinothchandar commented on a diff in pull request #5771: [HUDI-4071] Relax record key requirement and write with minimal options

Reply via email to