[GitHub] [hudi] dragonH commented on issue #6832: [SUPPORT] AWS Glue 3.0 fail to write dataset with hudi (hive sync issue)

GitBox Fri, 30 Sep 2022 01:14:32 -0700


dragonH commented on issue #6832:
URL: https://github.com/apache/hudi/issues/6832#issuecomment-1263257319


   hi @codope 
   
   thanks for your kindly help
   
   here's my another try
   
   another hudi config (I used `Customer_Sample_Hudi_001` instead of 
`Customer_Sample_Hudi`)
   
   first, i used `show tables` to list tables
   
   <img width="221" alt="image" 
src="https://user-images.githubusercontent.com/18332044/193223276-cd667a96-1bab-46cb-a266-00ca31e5f937.png";>
   
   we could see that the table `Customer_Sample_Hudi_001` was not there
   
   then used the hudi config below
   
   ```
   hudi_options = {
       'hoodie.table.name': 'Customer_Sample_Hudi_001',
       'hoodie.datasource.write.storage.type': 'COPY_ON_WRITE',
       'hoodie.datasource.write.recordkey.field': 'MA001',
       'hoodie.datasource.write.partitionpath.field': 'MA001',
       'hoodie.datasource.write.table.name': 'Customer_Sample_Hudi_001',
       'hoodie.datasource.write.operation': 'insert_overwrite',
       'hoodie.datasource.write.precombine.field': 'load_timestamp',
       'hoodie.datasource.write.hive_style_partitioning': 'true',
       'hoodie.upsert.shuffle.parallelism': 2,
       'hoodie.insert.shuffle.parallelism': 2,
       'path': 's3://drwu-data-platform-staging-data/Customer_Sample_Hudi_001/',
       'hoodie.datasource.hive_sync.enable': 'true',
       'hoodie.datasource.hive_sync.database': 'default',
       'hoodie.datasource.hive_sync.table': 'Customer_Sample_Hudi_001',
       'hoodie.datasource.hive_sync.partition_fields': 'MA001',
       'hoodie.datasource.hive_sync.partition_extractor_class': 
'org.apache.hudi.hive.MultiPartKeysValueExtractor',
       'hoodie.datasource.hive_sync.use_jdbc': 'false',
       'hoodie.datasource.hive_sync.mode': 'hms',
       'hoodie.datasource.hive_sync.ignore_exceptions': True
   }
   ```
   
   write the dataframe with the same way
   
   ```
   df \
       .write \
       .format("hudi") \
       .options(**hudi_options) \
       .mode("append") \
       .save()
   ```
   
   get the same error
   
   ```
   Py4JJavaError: An error occurred while calling o249.save.
   : org.apache.hudi.exception.HoodieException: Got runtime exception when hive 
syncing Customer_Sample_Hudi_001
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:118)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:539)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:595)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2$adapted(HoodieSparkSqlWriter.scala:591)
        at scala.collection.mutable.HashSet.foreach(HashSet.scala:77)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:591)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:665)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:286)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
        at 
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
        at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
        at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
        at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:750)
   Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync 
partitions for table Customer_Sample_Hudi_001
        at 
org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:363)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:197)
        at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:129)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:115)
        ... 46 more
   Caused by: java.lang.IllegalArgumentException: Partitions must be in the 
same table
        at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
        at 
com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.validateInputForBatchCreatePartitions(GlueMetastoreClientDelegate.java:800)
        at 
com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.batchCreatePartitions(GlueMetastoreClientDelegate.java:736)
        at 
com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.addPartitions(GlueMetastoreClientDelegate.java:718)
        at 
com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.add_partitions(AWSCatalogMetastoreClient.java:339)
        at 
org.apache.hudi.hive.ddl.HMSDDLExecutor.addPartitionsToTable(HMSDDLExecutor.java:198)
        at 
org.apache.hudi.hive.HoodieHiveClient.addPartitionsToTable(HoodieHiveClient.java:115)
        at 
org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:346)
        ... 49 more
   ```
   
   one weird thing is that the table was created as this time
   
   <img width="220" alt="image" 
src="https://user-images.githubusercontent.com/18332044/193223994-cc8daa21-00ce-4051-819e-18f302330475.png";>
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] dragonH commented on issue #6832: [SUPPORT] AWS Glue 3.0 fail to write dataset with hudi (hive sync issue)

Reply via email to