[ https://issues.apache.org/jira/browse/SPARK-33144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17936409#comment-17936409 ]
Mehakmeet Singh edited comment on SPARK-33144 at 3/18/25 8:23 AM: ------------------------------------------------------------------ [~CHENXCHEN] [~wankun] I faced this issue, what worked for me is to remove this property `spark.hadoop.mapreduce.fileoutputcommitter.cleanup.skipped=true` if you have it set. As we can see from the error {code:java} 20/10/14 09:15:33 INFO load-dynamic-partitions-3 [hive.ql.metadata.Hive:1919]: New loading path = hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-10000/_temporary/0 with partSpec {name=, version=} {code} While loading hive partitions, we also pick `_temporary` files from the staging directory, we want to avoid that since no partition spec would be present for that. By doing cleanup of the manifest committer temp dir, we will ensure that the partitions picked are valid. was (Author: JIRAUSER293639): [~CHENXCHEN] [~wankun] I faced this issue, what worked for me is to remove this property `spark.hadoop.mapreduce.fileoutputcommitter.cleanup.skipped=true` if you have it set. As we can see from the error 20/10/14 09:15:33 INFO load-dynamic-partitions-3 [hive.ql.metadata.Hive:1919]: New loading path = hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-10000/_temporary/0 with partSpec \{name=, version=} While loading hive partitions, we also pick `_temporary` files from the staging directory, we want to avoid that since no partition spec would be present for that. By doing cleanup of the manifest committer temp dir, we will ensure that the partitions picked are valid. > Connot insert overwite multiple partition, get exception "get partition: > Value for key name is null or empty" > ------------------------------------------------------------------------------------------------------------- > > Key: SPARK-33144 > URL: https://issues.apache.org/jira/browse/SPARK-33144 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.1, 3.2.1 > Environment: hadoop 2.7.3 + spark 3.0.1 > hadoop 2.7.3 + spark 3.2.1 > Reporter: CHC > Priority: Major > > When: > {code:sql} > create table tmp.spark_multi_partition( > id int > ) > partitioned by (name string, version string) > stored as orc > ; > set hive.exec.dynamic.partition=true; > set spark.hadoop.hive.exec.dynamic.partition=true; > > set hive.exec.dynamic.partition.mode=nonstrict; > set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict; > insert overwrite table tmp.spark_multi_partition partition (name, version) > select > * > from ( > select > 1 as id > , 'hadoop' as name > , '2.7.3' as version > union > select > 2 as id > , 'spark' as name > , '3.0.1' as version > union > select > 3 as id > , 'hive' as name > , '2.3.4' as version > ) as A; > {code} > and get exception: > {code:bash} > INFO load-dynamic-partitions-0 [hive.ql.metadata.Hive:1919]: New loading path > = > hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-10000/name=spark/version=3.0.1 > with partSpec {name=spark, version=3.0.1} > 20/10/14 09:15:33 INFO load-dynamic-partitions-1 > [hive.ql.metadata.Hive:1919]: New loading path = > hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-10000/name=hadoop/version=2.7.3 > with partSpec {name=hadoop, version=2.7.3} > 20/10/14 09:15:33 INFO load-dynamic-partitions-2 > [hive.ql.metadata.Hive:1919]: New loading path = > hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-10000/name=hive/version=2.3.4 > with partSpec {name=hive, version=2.3.4} > 20/10/14 09:15:33 INFO load-dynamic-partitions-3 > [hive.ql.metadata.Hive:1919]: New loading path = > hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-10000/_temporary/0 > with partSpec {name=, version=} > 20/10/14 09:15:33 ERROR load-dynamic-partitions-3 > [hive.ql.metadata.Hive:1937]: Exception when loading partition with > parameters > partPath=hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-10000/_temporary/0, > table=spark_multi_partition, partSpec={name=, version=}, replace=true, > listBucketingEnabled=false, isAcid=false, hasFollowingStatsTask=false > org.apache.hadoop.hive.ql.metadata.HiveException: get partition: Value for > key name is null or empty > at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2233) > at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2181) > at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1611) > at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1922) > at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1913) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 20/10/14 09:15:33 INFO Delete-Thread-0 > [org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: > 'hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1/part-00001-b745147b-600f-4c79-8ba2-12a99283b0a9.c000' > to trash at: > hdfs://namespace/user/hive/.Trash/Current/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1/part-00001-b745147b-600f-4c79-8ba2-12a99283b0a9.c000 > 20/10/14 09:15:33 INFO load-dynamic-partitions-0 > [org.apache.hadoop.hive.common.FileUtils:520]: Creating directory if it > doesn't exist: > hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1 > 20/10/14 09:15:33 INFO Delete-Thread-0 > [org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: > 'hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=hive/version=2.3.4/part-00002-b745147b-600f-4c79-8ba2-12a99283b0a9.c000' > to trash at: > hdfs://namespace/user/hive/.Trash/Current/apps/hive/warehouse/tmp.db/spark_multi_partition/name=hive/version=2.3.4/part-00002-b745147b-600f-4c79-8ba2-12a99283b0a9.c000 > 20/10/14 09:15:33 INFO load-dynamic-partitions-2 > [org.apache.hadoop.hive.common.FileUtils:520]: Creating directory if it > doesn't exist: > hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=hive/version=2.3.4 > 20/10/14 09:15:33 INFO Delete-Thread-0 > [org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: > 'hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=hadoop/version=2.7.3/part-00000-b745147b-600f-4c79-8ba2-12a99283b0a9.c000' > to trash at: > hdfs://namespace/user/hive/.Trash/Current/apps/hive/warehouse/tmp.db/spark_multi_partition/name=hadoop/version=2.7.3/part-00000-b745147b-600f-4c79-8ba2-12a99283b0a9.c000 > 20/10/14 09:15:33 INFO load-dynamic-partitions-1 > [org.apache.hadoop.hive.common.FileUtils:520]: Creating directory if it > doesn't exist: > hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=hadoop/version=2.7.3 > Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: Exception > when loading 4 in table spark_multi_partition with > loadPath=hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-10000; > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Exception when loading 4 in > table spark_multi_partition with > loadPath=hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-10000; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:113) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadDynamicPartitions(HiveExternalCatalog.scala:924) > at > org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.loadDynamicPartitions(ExternalCatalogWithListener.scala:189) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.processInsert(InsertIntoHiveTable.scala:258) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.run(InsertIntoHiveTable.scala:102) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:120) > at > org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229) > at > org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3618) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3616) > at org.apache.spark.sql.Dataset.<init>(Dataset.scala:229) > at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97) > at > org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:650) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:63) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:377) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:496) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:490) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:282) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Exception when > loading 4 in table spark_multi_partition with > loadPath=hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-10000 > at > org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:1963) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.sql.hive.client.Shim_v2_1.loadDynamicPartitions(HiveShim.scala:1226) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$loadDynamicPartitions$1(HiveClientImpl.scala:903) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:294) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:276) > at > org.apache.spark.sql.hive.client.HiveClientImpl.loadDynamicPartitions(HiveClientImpl.scala:894) > at > org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$loadDynamicPartitions$1(HiveExternalCatalog.scala:944) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:103) > ... 47 more > Caused by: java.util.concurrent.ExecutionException: > org.apache.hadoop.hive.ql.metadata.HiveException: get partition: Value for > key name is null or empty > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:1954) > ... 62 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: get partition: > Value for key name is null or empty > at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2233) > at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2181) > at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1611) > at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1922) > at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1913) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org