[I] [SUPPORT] [hudi]

via GitHub Thu, 13 Mar 2025 11:13:33 -0700


alberttwong opened a new issue, #12974:
URL: https://github.com/apache/hudi/issues/12974


   **Describe the problem you faced**
   
   Following the quickstart at https://hudi.apache.org/docs/quick-start-guide/ 
on EMR 7.6. 
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   ```
   [hadoop@ip-10-0-102-126 ~]$ export SPARK_VERSION=3.5
   [hadoop@ip-10-0-102-126 ~]$ pyspark --packages 
org.apache.hudi:hudi-spark$SPARK_VERSION-bundle_2.12:1.0.1 --conf 
'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf 
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
 --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' 
--conf 'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar'
   Python 3.9.20 (main, Jan 25 2025, 00:00:00)
   [GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] on linux
   Type "help", "copyright", "credits" or "license" for more information.
   :: loading settings :: url = 
jar:file:/usr/lib/spark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
   Ivy Default Cache set to: /home/hadoop/.ivy2/cache
   The jars for the packages stored in: /home/hadoop/.ivy2/jars
   org.apache.hudi#hudi-spark3.5-bundle_2.12 added as a dependency
   :: resolving dependencies :: 
org.apache.spark#spark-submit-parent-d8e6cf2e-7919-4478-955d-803d950e3ddd;1.0
        confs: [default]
        found org.apache.hudi#hudi-spark3.5-bundle_2.12;1.0.1 in central
        found org.apache.hive#hive-storage-api;2.8.1 in central
        found org.slf4j#slf4j-api;1.7.36 in central
   downloading 
https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3.5-bundle_2.12/1.0.1/hudi-spark3.5-bundle_2.12-1.0.1.jar
 ...
        [SUCCESSFUL ] 
org.apache.hudi#hudi-spark3.5-bundle_2.12;1.0.1!hudi-spark3.5-bundle_2.12.jar 
(1200ms)
   :: resolution report :: resolve 540ms :: artifacts dl 1207ms
        :: modules in use:
        org.apache.hive#hive-storage-api;2.8.1 from central in [default]
        org.apache.hudi#hudi-spark3.5-bundle_2.12;1.0.1 from central in 
[default]
        org.slf4j#slf4j-api;1.7.36 from central in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   3   |   1   |   1   |   0   ||   3   |   1   |
        ---------------------------------------------------------------------
   :: retrieving :: 
org.apache.spark#spark-submit-parent-d8e6cf2e-7919-4478-955d-803d950e3ddd
        confs: [default]
        1 artifacts copied, 2 already retrieved (108061kB/64ms)
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
   25/03/13 18:01:20 WARN HiveConf: HiveConf of name hive.server2.thrift.url 
does not exist
   25/03/13 18:01:21 WARN Client: Neither spark.yarn.jars nor 
spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
   25/03/13 18:01:27 WARN Client: Same path resource 
file:///home/hadoop/.ivy2/jars/org.apache.hudi_hudi-spark3.5-bundle_2.12-1.0.1.jar
 added multiple times to distributed cache.
   25/03/13 18:01:27 WARN Client: Same path resource 
file:///home/hadoop/.ivy2/jars/org.apache.hive_hive-storage-api-2.8.1.jar added 
multiple times to distributed cache.
   25/03/13 18:01:27 WARN Client: Same path resource 
file:///home/hadoop/.ivy2/jars/org.slf4j_slf4j-api-1.7.36.jar added multiple 
times to distributed cache.
   Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /__ / .__/\_,_/_/ /_/\_\   version 3.5.3-amzn-0
         /_/
   
   Using Python version 3.9.20 (main, Jan 25 2025 00:00:00)
   Spark context Web UI available at 
http://ip-10-0-102-126.us-west-2.compute.internal:4040
   Spark context available as 'sc' (master = yarn, app id = 
application_1741888657336_0003).
   SparkSession available as 'spark'.
   >>> from pyspark.sql.functions import lit, col
   >>>
   >>> tableName = "trips_table"
   >>> basePath = "file:///tmp/trips_table"
   >>> columns = ["ts","uuid","rider","driver","fare","city"]
   >>> data 
=[(1695159649087,"334e26e9-8355-45cc-97c6-c31daf0df330","rider-A","driver-K",19.10,"san_francisco"),
   ...        
(1695091554788,"e96c4396-3fad-413a-a942-4cb36106d721","rider-C","driver-M",27.70
 ,"san_francisco"),
   ...        
(1695046462179,"9909a8b1-2d15-4d3d-8ec9-efc48c536a00","rider-D","driver-L",33.90
 ,"san_francisco"),
   ...        
(1695516137016,"e3cf430c-889d-4015-bc98-59bdce1e530c","rider-F","driver-P",34.15,"sao_paulo"),
   ...        
(1695115999911,"c8abbe79-8d89-47ea-b4ce-4d224bae5bfa","rider-J","driver-T",17.85,"chennai")]
   >>> inserts = spark.createDataFrame(data).toDF(*columns)
   udi"). \
       options(**hudi_options). \
       mode("overwrite"). \
       save(basePath)
   >>>
   >>> hudi_options = {
   ...     'hoodie.table.name': tableName,
   ...     'hoodie.datasource.write.partitionpath.field': 'city'
   ... }
   >>>
   >>> inserts.write.format("hudi"). \
   ...     options(**hudi_options). \
   ...     mode("overwrite"). \
   ...     save(basePath)
   25/03/13 18:01:51 WARN HoodieSparkSqlWriterInternal: Choosing BULK_INSERT as 
the operation type since auto record key generation is applicable
   25/03/13 18:01:51 INFO HoodieTableMetaClient: Initializing 
file:/tmp/trips_table as hoodie table
   25/03/13 18:01:51 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient 
from file:/tmp/trips_table
   25/03/13 18:01:51 INFO HoodieTableConfig: Loading table properties from 
file:/tmp/trips_table/.hoodie/hoodie.properties
   25/03/13 18:01:51 INFO HoodieTableMetaClient: Finished Loading Table of type 
COPY_ON_WRITE(version=2) from file:/tmp/trips_table
   25/03/13 18:01:51 INFO HoodieTableMetaClient: Finished initializing Table of 
type COPY_ON_WRITE from file:/tmp/trips_table
   25/03/13 18:01:51 INFO ActiveTimelineV2: Loaded instants upto : 
Optional.empty
   25/03/13 18:01:51 WARN HiveConf: HiveConf of name hive.server2.thrift.url 
does not exist
   25/03/13 18:01:51 INFO EmbeddedTimelineService: Overriding hostIp to 
(ip-10-0-102-126.us-west-2.compute.internal) found in spark-conf. It was null
   25/03/13 18:01:51 INFO FileSystemViewManager: Creating View Manager with 
storage type MEMORY.
   25/03/13 18:01:51 INFO log: Logging initialized @35255ms to 
org.apache.hudi.org.apache.jetty.util.log.Slf4jLog
   25/03/13 18:01:52 INFO Server: jetty-9.4.53.v20231009; built: 
2023-10-09T12:29:09.265Z; git: 27bde00a0b95a1d5bbee0eae7984f891d2d0f8c9; jvm 
17.0.14+7-LTS
   25/03/13 18:01:52 INFO Server: Started @35490ms
   25/03/13 18:01:52 INFO TimelineService: Starting Timeline server on port: 
32785
   25/03/13 18:01:52 INFO EmbeddedTimelineService: Started embedded timeline 
server at ip-10-0-102-126.us-west-2.compute.internal:32785
   25/03/13 18:01:52 INFO BaseHoodieClient: Timeline Server already running. 
Not restarting the service
   25/03/13 18:01:52 INFO HoodieSparkSqlWriterInternal: 
Config.inlineCompactionEnabled ? false
   25/03/13 18:01:52 INFO HoodieSparkSqlWriterInternal: 
Config.asyncClusteringEnabled ? false
   25/03/13 18:01:52 INFO TimeGeneratorBase: LockProvider for TimeGenerator: 
org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
   25/03/13 18:01:52 INFO BaseZookeeperBasedLockProvider: Creating zookeeper 
path /hudi/trips_table if not exists
   25/03/13 18:01:52 INFO BaseZookeeperBasedLockProvider: ACQUIRING lock 
atZkBasePath = /hudi, lock key = trips_table
   25/03/13 18:01:52 INFO BaseZookeeperBasedLockProvider: ACQUIRED lock 
atZkBasePath = /hudi, lock key = trips_table
   25/03/13 18:01:52 INFO BaseZookeeperBasedLockProvider: RELEASING lock 
atZkBasePath = /hudi, lock key = trips_table
   25/03/13 18:01:52 INFO BaseZookeeperBasedLockProvider: RELEASED lock 
atZkBasePath = /hudi, lock key = trips_table
   25/03/13 18:01:52 INFO TimeGeneratorBase: Released the connection of the 
timeGenerator lock
   25/03/13 18:01:52 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient 
from file:///tmp/trips_table
   25/03/13 18:01:52 INFO HoodieTableConfig: Loading table properties from 
file:/tmp/trips_table/.hoodie/hoodie.properties
   25/03/13 18:01:52 INFO HoodieTableMetaClient: Finished Loading Table of type 
COPY_ON_WRITE(version=2) from file:///tmp/trips_table
   25/03/13 18:01:52 INFO HoodieTableMetaClient: Loading Active commit timeline 
for file:///tmp/trips_table
   25/03/13 18:01:52 INFO ActiveTimelineV2: Loaded instants upto : 
Optional.empty
   25/03/13 18:01:52 INFO TransactionManager: Transaction starting for 
Option{val=[==>20250313180152324__commit__INFLIGHT]} with latest completed 
transaction instant Optional.empty
   25/03/13 18:01:52 INFO LockManager: LockProvider 
org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
   25/03/13 18:01:52 INFO BaseZookeeperBasedLockProvider: Creating zookeeper 
path /hudi/trips_table if not exists
   25/03/13 18:01:52 INFO BaseZookeeperBasedLockProvider: ACQUIRING lock 
atZkBasePath = /hudi, lock key = trips_table
   25/03/13 18:01:52 INFO BaseZookeeperBasedLockProvider: ACQUIRED lock 
atZkBasePath = /hudi, lock key = trips_table
   25/03/13 18:01:52 INFO TransactionManager: Transaction started for 
Option{val=[==>20250313180152324__commit__INFLIGHT]} with latest completed 
transaction instant Optional.empty
   25/03/13 18:01:52 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient 
from file:///tmp/trips_table
   25/03/13 18:01:52 INFO HoodieTableConfig: Loading table properties from 
file:/tmp/trips_table/.hoodie/hoodie.properties
   25/03/13 18:01:52 INFO HoodieTableMetaClient: Finished Loading Table of type 
COPY_ON_WRITE(version=2) from file:///tmp/trips_table
   25/03/13 18:01:52 INFO HoodieBackedTableMetadataWriter: Async metadata 
indexing disabled and following partitions already initialized: []
   25/03/13 18:01:52 INFO ActiveTimelineV2: Loaded instants upto : 
Optional.empty
   25/03/13 18:01:52 INFO HoodieTableMetaClient: Initializing 
file:/tmp/trips_table/.hoodie/metadata as hoodie table
   25/03/13 18:01:52 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient 
from file:/tmp/trips_table/.hoodie/metadata
   25/03/13 18:01:52 INFO HoodieTableConfig: Loading table properties from 
file:/tmp/trips_table/.hoodie/metadata/.hoodie/hoodie.properties
   25/03/13 18:01:52 INFO HoodieTableMetaClient: Finished Loading Table of type 
MERGE_ON_READ(version=2) from file:/tmp/trips_table/.hoodie/metadata
   25/03/13 18:01:52 INFO HoodieTableMetaClient: Finished initializing Table of 
type MERGE_ON_READ from file:/tmp/trips_table/.hoodie/metadata
   25/03/13 18:01:52 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient 
from file:///tmp/trips_table/.hoodie/metadata
   25/03/13 18:01:52 INFO HoodieTableConfig: Loading table properties from 
file:/tmp/trips_table/.hoodie/metadata/.hoodie/hoodie.properties
   25/03/13 18:01:52 INFO HoodieTableMetaClient: Finished Loading Table of type 
MERGE_ON_READ(version=2) from file:///tmp/trips_table/.hoodie/metadata
   25/03/13 18:01:52 INFO ActiveTimelineV2: Loaded instants upto : 
Optional.empty
   25/03/13 18:01:52 INFO HoodieBackedTableMetadataWriter: Initializing MDT 
partition FILES at instant 00000000000000000
   25/03/13 18:01:52 INFO HoodieBackedTableMetadataWriter: Committing total 0 
partitions and 0 files to metadata
   25/03/13 18:01:52 INFO HoodieBackedTableMetadataWriter: Initializing FILES 
index with 1 mappings
   25/03/13 18:01:52 INFO HoodieBackedTableMetadataWriter: Creating 1 file 
groups for partition files with base fileId files- at instant time 
00000000000000000
   25/03/13 18:01:54 INFO AbstractTableFileSystemView: Took 1 ms to read  0 
instants, 0 replaced file groups
   25/03/13 18:01:54 INFO ClusteringUtils: Found 0 files in pending clustering 
operations
   25/03/13 18:01:54 INFO HoodieTableMetadataUtil: Loading latest file slices 
for metadata table partition files
   25/03/13 18:01:54 INFO AbstractTableFileSystemView: Building file system 
view for partition (files)
   25/03/13 18:01:54 WARN HoodieTableFileSystemView: Partition: files is not 
available in store
   25/03/13 18:01:54 WARN HoodieTableFileSystemView: Partition: files is not 
available in store
   25/03/13 18:01:54 INFO TransactionManager: Transaction ending with 
transaction owner Option{val=[==>20250313180152324__commit__INFLIGHT]}
   25/03/13 18:01:54 INFO BaseZookeeperBasedLockProvider: RELEASING lock 
atZkBasePath = /hudi, lock key = trips_table
   25/03/13 18:01:54 INFO BaseZookeeperBasedLockProvider: RELEASED lock 
atZkBasePath = /hudi, lock key = trips_table
   25/03/13 18:01:54 INFO LockManager: Released connection created for 
acquiring lock
   25/03/13 18:01:54 INFO TransactionManager: Transaction ended with 
transaction owner Option{val=[==>20250313180152324__commit__INFLIGHT]}
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/usr/lib/spark/python/pyspark/sql/readwriter.py", line 1463, in save
       self._jwrite.save(path)
     File 
"/usr/lib/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 
1322, in __call__
     File "/usr/lib/spark/python/pyspark/errors/exceptions/captured.py", line 
179, in deco
       return f(*a, **kw)
     File "/usr/lib/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", 
line 326, in get_return_value
   py4j.protocol.Py4JJavaError: An error occurred while calling o284.save.
   : org.apache.hudi.exception.HoodieException: Failed to instantiate Metadata 
table
        at 
org.apache.hudi.client.SparkRDDWriteClient.initializeMetadataTable(SparkRDDWriteClient.java:309)
        at 
org.apache.hudi.client.SparkRDDWriteClient.initMetadataTable(SparkRDDWriteClient.java:271)
        at 
org.apache.hudi.client.BaseHoodieWriteClient.lambda$doInitTable$7(BaseHoodieWriteClient.java:1305)
        at 
org.apache.hudi.client.BaseHoodieWriteClient.executeUsingTxnManager(BaseHoodieWriteClient.java:1312)
        at 
org.apache.hudi.client.BaseHoodieWriteClient.doInitTable(BaseHoodieWriteClient.java:1302)
        at 
org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1352)
        at 
org.apache.hudi.commit.BaseDatasetBulkInsertCommitActionExecutor.execute(BaseDatasetBulkInsertCommitActionExecutor.java:100)
        at 
org.apache.hudi.HoodieSparkSqlWriterInternal.bulkInsertAsRow(HoodieSparkSqlWriter.scala:832)
        at 
org.apache.hudi.HoodieSparkSqlWriterInternal.writeInternal(HoodieSparkSqlWriter.scala:494)
        at 
org.apache.hudi.HoodieSparkSqlWriterInternal.$anonfun$write$1(HoodieSparkSqlWriter.scala:192)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:108)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:384)
        at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:157)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$10(SQLExecution.scala:220)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:108)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:384)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$9(SQLExecution.scala:220)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:405)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:219)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:901)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:83)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
        at 
org.apache.spark.sql.adapter.BaseSpark3Adapter.sqlExecutionWithNewExecutionId(BaseSpark3Adapter.scala:105)
        at 
org.apache.hudi.HoodieSparkSqlWriterInternal.write(HoodieSparkSqlWriter.scala:214)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:129)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:170)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:126)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:108)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:384)
        at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:157)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$10(SQLExecution.scala:220)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:108)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:384)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$9(SQLExecution.scala:220)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:405)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:219)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:901)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:83)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:123)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:114)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:520)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:77)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:520)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:34)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:303)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:299)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:34)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:34)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:496)
        at 
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:114)
        at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:101)
        at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:99)
        at 
org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:164)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:884)
        at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:405)
        at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:365)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:244)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:569)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at 
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.base/java.lang.Thread.run(Thread.java:840)
   Caused by: java.lang.IllegalArgumentException: FileGroup count for MDT 
partition files should be > 0
        at 
org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:42)
        at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.prepRecords(HoodieBackedTableMetadataWriter.java:1442)
        at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.commitInternal(HoodieBackedTableMetadataWriter.java:1349)
        at 
org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.bulkCommit(SparkHoodieBackedTableMetadataWriter.java:149)
        at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.initializeFromFilesystem(HoodieBackedTableMetadataWriter.java:489)
        at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.initializeIfNeeded(HoodieBackedTableMetadataWriter.java:280)
        at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.<init>(HoodieBackedTableMetadataWriter.java:189)
        at 
org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.<init>(SparkHoodieBackedTableMetadataWriter.java:114)
        at 
org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.create(SparkHoodieBackedTableMetadataWriter.java:91)
        at 
org.apache.hudi.client.SparkRDDWriteClient.initializeMetadataTable(SparkRDDWriteClient.java:303)
        ... 73 more
   ```
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 1.0.1
   
   * Spark version : 3.5
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) : no
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] [SUPPORT] [hudi]

Reply via email to