jiangbiao910 opened a new issue, #6462:
URL: https://github.com/apache/hudi/issues/6462
Environment Description
Hudi version : 0.11.0
Flink version : 1.13.1
Hive version : 2.1.1-cdh6.2.0
Hadoop version : 3.0.0-cdh6.2.0
Storage (HDFS/S3/GCS..) : HDFS
Running on Docker? (yes/no) : no
when I use execute Spark SQL:
SET TBLPROPERTIES ("hoodie.metadata.enable"="false")
`INSERT OVERWRITE TABLE zone_dw.dws_stat_day_di PARTITION(dt)
select * from zone_dw.dws_map_refresh_hi where dt BETWEEN '20220812' AND
'20220819' `
zone_dw.dws_stat_day_di is Hive Table and zone_dw.dws_map_refresh_hi is
Hudi COW table
the error log :
22/08/20 02:56:58 ERROR client.RemoteDriver: Failed to run client job
39d720db-b15d-4823-b8b1-54398b143d6e
org.apache.hudi.exception.HoodieException: Error fetching partition paths
from metadata table
at
org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:315)
at
org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:176)
at
org.apache.hudi.BaseHoodieTableFileIndex.loadPartitionPathFiles(BaseHoodieTableFileIndex.java:219)
at
org.apache.hudi.BaseHoodieTableFileIndex.doRefresh(BaseHoodieTableFileIndex.java:264)
at
org.apache.hudi.BaseHoodieTableFileIndex.<init>(BaseHoodieTableFileIndex.java:139)
at
org.apache.hudi.hadoop.HiveHoodieTableFileIndex.<init>(HiveHoodieTableFileIndex.java:49)
at
org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:234)
at
org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:141)
at
org.apache.hudi.hadoop.HoodieParquetInputFormatBase.listStatus(HoodieParquetInputFormatBase.java:90)
at
org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.listStatus(HoodieCombineHiveInputFormat.java:889)
at
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217)
at
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:76)
at
org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.getSplits(HoodieCombineHiveInputFormat.java:942)
at
org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getCombineSplits(HoodieCombineHiveInputFormat.java:241)
at
org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getSplits(HoodieCombineHiveInputFormat.java:363)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.rdd.RDD.getNumPartitions(RDD.scala:267)
at
org.apache.spark.api.java.JavaRDDLike$class.getNumPartitions(JavaRDDLike.scala:65)
at
org.apache.spark.api.java.AbstractJavaRDDLike.getNumPartitions(JavaRDDLike.scala:45)
at
org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateMapInput(SparkPlanGenerator.java:252)
at
org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:179)
at
org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:130)
at
org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:355)
at
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:400)
at
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:365)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieMetadataException: Failed to
retrieve list of partition from metadata
at
org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:113)
at
org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:313)
... 32 more
Caused by: java.util.NoSuchElementException: No value present in Option
at org.apache.hudi.common.util.Option.get(Option.java:89)
at
org.apache.hudi.metadata.HoodieTableMetadataUtil.getPartitionFileSlices(HoodieTableMetadataUtil.java:1057)
at
org.apache.hudi.metadata.HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(HoodieTableMetadataUtil.java:1001)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getPartitionFileSliceToKeysMapping(HoodieBackedTableMetadata.java:377)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:204)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordByKey(HoodieBackedTableMetadata.java:140)
at
org.apache.hudi.metadata.BaseTableMetadata.fetchAllPartitionPaths(BaseTableMetadata.java:281)
at
org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:111)
... 33 more
22/08/20 02:56:59 INFO client.RemoteDriver: Shutting down Spark Remote
Driver.
22/08/20 02:56:59 INFO server.AbstractConnector: Stopped
Spark@ce7a81b{HTTP/1.1,[http/1.1]}{0.0.0.0:0}
22/08/20 02:56:59 INFO ui.SparkUI: Stopped Spark web UI at
http://scsp04097:34219
22/08/20 02:56:59 INFO yarn.YarnAllocator: Driver requested a total number
of 0 executor(s).
22/08/20 02:56:59 INFO cluster.YarnClusterSchedulerBackend: Shutting down
all executors
22/08/20 02:56:59 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint:
Asking each executor to shut down
22/08/20 02:56:59 INFO cluster.SchedulerExtensionServices: Stopping
SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
22/08/20 02:56:59 INFO spark.MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
22/08/20 02:56:59 INFO memory.MemoryStore: MemoryStore cleared
**But when I run the sql again, it work well;**
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]