Hi Everyone,
I'm stuck with one problem, where I need to provide a custom GCS location
for the Hive table from Spark. The code fails while doing an *'insert into'*
whenever my Hive table has a flag GS location like gs://<bucket_name>, but
works for nested locations like gs://bucket_name/blob_name.
Is anyone aware if it's an issue from Spark side or any config I need to
pass for it?
*The issue is happening in 2.x and 3.x both.*
Config using:
spark.conf.set("spark.hadoop.hive.exec.dynamic.partition.mode", "nonstrict")
spark.conf.set("spark.hadoop.hive.exec.dynamic.partition", true)
spark.conf.set("hive.exec.dynamic.partition.mode","nonstrict")
spark.conf.set("hive.exec.dynamic.partition", true)
*Case 1 : FAILS*
val DF = Seq(("test1", 123)).toDF("name", "num")
val partKey = List("num").map(x => x)
DF.write.option("path",
"gs://test_dd1/").mode(SaveMode.Overwrite).partitionBy(partKey:
_*).format("orc").saveAsTable("us_wm_supply_chain_otif_stg.test_tb1")
val DF1 = Seq(("test2", 125)).toDF("name", "num")
DF.write.mode(SaveMode.Overwrite).format("orc").insertInto("us_wm_supply_chain_otif_stg.test_tb1")
*java.lang.NullPointerException at
org.apache.hadoop.fs.Path.<init>(Path.java:141) at
org.apache.hadoop.fs.Path.<init>(Path.java:120) at
org.apache.hadoop.fs.Path.suffix(Path.java:441) at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.$anonfun$getCustomPartitionLocations$1(InsertIntoHadoopFsRelationCommand.scala:254)*
*Case 2: Succeeds *
val DF = Seq(("test1", 123)).toDF("name", "num")
val partKey = List("num").map(x => x)
DF.write.option("path",
"gs://test_dd1/abc/").mode(SaveMode.Overwrite).partitionBy(partKey:
_*).format("orc").saveAsTable("us_wm_supply_chain_otif_stg.test_tb2")
val DF1 = Seq(("test2", 125)).toDF("name", "num")
DF1.write.mode(SaveMode.Overwrite).format("orc").insertInto("us_wm_supply_chain_otif_stg.test_tb2")
With Best Regards,
Dipayan Dev