crutis opened a new issue, #6281:
URL: https://github.com/apache/hudi/issues/6281

   My job is just a wrapper around `HoodieDeltaStreamer` (yes, there are 
probably better ways to do this).
   
   ```
   public class SparkHudiPoc {
       public static void main(String[] args) throws Exception {
           HoodieDeltaStreamer.main(args);
       }
   }
   ```
   
   From pom.xml:
   ```
       <properties>
           <!-- DEPENDENCY VERSIONS -->
           <hudi.version>0.11.1</hudi.version>
           <scala.version>2.12.10</scala.version>
           <spark.version>3.1.2</spark.version>
           <aws-java-sdk.version>1.12.257</aws-java-sdk.version>
           <hadoop.version>3.2.1</hadoop.version>
           <parquet.version>1.10.0</parquet.version>
       </properties>
   
       <dependencies>
           <dependency>
               <groupId>org.apache.hudi</groupId>
               <artifactId>hudi-utilities-bundle_2.12</artifactId>
               <version>${hudi.version}</version>
           </dependency>
           <dependency>
               <groupId>org.apache.parquet</groupId>
               <artifactId>parquet-avro</artifactId>
               <version>${parquet.version}</version>
           </dependency>
           <dependency>
               <groupId>org.apache.hadoop</groupId>
               <artifactId>hadoop-client</artifactId>
               <version>${hadoop.version}</version>
           </dependency>
           <dependency>
               <groupId>org.apache.hadoop</groupId>
               <artifactId>hadoop-aws</artifactId>
               <version>${hadoop.version}</version>
               <exclusions>
                   <exclusion>
                       <groupId>com.amazonaws</groupId>
                       <artifactId>aws-java-sdk-bundle</artifactId>
                   </exclusion>
               </exclusions>
           </dependency>
           <dependency>
               <groupId>com.amazonaws</groupId>
               <artifactId>aws-java-sdk-s3</artifactId>
               <version>${aws-java-sdk.version}</version>
           </dependency>
           <dependency>
               <groupId>com.amazonaws</groupId>
               <artifactId>aws-java-sdk-sts</artifactId>
               <version>${aws-java-sdk.version}</version>
           </dependency>
       </dependencies>
   
       <dependencyManagement>
           <dependencies>
               <dependency>
                   <groupId>org.scala-lang</groupId>
                   <artifactId>scala-library</artifactId>
                   <version>${scala.version}</version>
               </dependency>
               <dependency>
                   <groupId>org.apache.hadoop</groupId>
                   <artifactId>hadoop-client</artifactId>
                   <version>${hadoop.version}</version>
               </dependency>
           </dependencies>
       </dependencyManagement>
   ```
   
   ```
   spark-submit
   --master yarn
   --deploy-mode client
   s3://path-to/my-fat-jar.jar
   --enable-sync
   --disable-compaction
   --sync-tool-classes org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool
   --min-sync-interval-seconds 60
   --op UPSERT
   --payload-class 
org.apache.hudi.common.model.debezium.MySqlDebeziumAvroPayload
   --source-class org.apache.hudi.utilities.sources.debezium.MysqlDebeziumSource
   --source-ordering-field _event_origin_ts_ms
   --table-type MERGE_ON_READ
   --target-base-path s3://my-bucket/path/table_name
   --target-table table_name
   --continuous
   --hoodie-conf auto.offset.reset=earliest
   --hoodie-conf bootstrap.servers=kafka-server:9092
   --hoodie-conf group.id=spark-hudi-poc
   --hoodie-conf schema.registry.url=http://registry:8081
   --hoodie-conf 
hoodie.deltastreamer.schemaprovider.registry.url=http://registry:8081/subjects/CDC-value/versions/latest
   --hoodie-conf hoodie.deltastreamer.source.kafka.topic=CDC
   --hoodie-conf hoodie.datasource.hive_sync.database=spark-hudi-poc
   --hoodie-conf hoodie.datasource.hive_sync.skip_ro_suffix=true
   --hoodie-conf hoodie.datasource.hive_sync.table=table_name
   --hoodie-conf hoodie.datasource.write.recordkey.field=id
   --hoodie-conf hoodie.datasource.write.partitionpath.field=createdDate
   --hoodie-conf 
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.TimestampBasedKeyGenerator
   --hoodie-conf hoodie.deltastreamer.keygen.timebased.timezone=GMT
   --hoodie-conf 
hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd
   --hoodie-conf 
hoodie.deltastreamer.keygen.timebased.timestamp.type=EPOCHMILLISECONDS
   ```
   
   
   **Environment Description**
   
   * Hudi version : 0.11.1 (fat jar)
   
   * EMR 6.5.0
   
   * Spark version : 3.1.2
   
   * Hive version : 3.1.2
   
   * Hadoop version : Amazon 3.2.1
   
   * Storage : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Stacktrace**
   
   ```
   22/08/02 16:46:49 ERROR HoodieDeltaStreamer: Shutting down delta-sync due to 
exception
   org.apache.hudi.exception.HoodieException: Could not sync using the meta 
sync class org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool
        at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:61)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncMeta(DeltaSync.java:715)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:634)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:333)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:679)
        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
   Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception 
when hive syncing mtrees_usertrees
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:143)
        at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:59)
        ... 8 more
   Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync 
partitions for table mtrees_usertrees
        at 
org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:414)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:232)
        at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:156)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:140)
        ... 9 more
   Caused by: org.apache.hudi.aws.sync.HoodieGlueSyncException: Fail to add 
partitions to spark-hudi-poc.mtrees_usertrees
        at 
org.apache.hudi.aws.sync.AWSGlueCatalogSyncClient.addPartitionsToTable(AWSGlueCatalogSyncClient.java:147)
        at 
org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:397)
        ... 12 more
   Caused by: com.amazonaws.services.glue.model.InvalidInputException: The 
number of partition keys do not match the number of partition values (Service: 
AWSGlue; Status Code: 400; Error Code: InvalidInputException; Request ID: 
00f4d354-50a0-4b98-bce4-bab5569339c8; Proxy: null)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1862)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1415)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1384)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1154)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:811)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:713)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:695)
        at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559)
        at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:539)
        at 
com.amazonaws.services.glue.AWSGlueClient.doInvoke(AWSGlueClient.java:10640)
        at 
com.amazonaws.services.glue.AWSGlueClient.invoke(AWSGlueClient.java:10607)
        at 
com.amazonaws.services.glue.AWSGlueClient.invoke(AWSGlueClient.java:10596)
        at 
com.amazonaws.services.glue.AWSGlueClient.executeBatchCreatePartition(AWSGlueClient.java:259)
        at 
com.amazonaws.services.glue.AWSGlueClient.batchCreatePartition(AWSGlueClient.java:228)
        at 
org.apache.hudi.aws.sync.AWSGlueCatalogSyncClient.addPartitionsToTable(AWSGlueCatalogSyncClient.java:139)
        ... 13 more
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to