Probable Spark Bug while inserting into flat GCS bucket?

Dipayan Dev Sat, 19 Aug 2023 12:19:44 -0700

Hi Everyone,

I'm stuck with one problem, where I need to provide a custom GCS location
for the Hive table from Spark. The code fails while doing an *'insert into'*
whenever my Hive table has a flag GS location like gs://<bucket_name>, but
works for nested locations like gs://bucket_name/blob_name.


Is anyone aware if it's an issue from Spark side or any config I need to
pass for it?

*The issue is happening in 2.x and 3.x both.*

Config using:

spark.conf.set("spark.hadoop.hive.exec.dynamic.partition.mode", "nonstrict")
spark.conf.set("spark.hadoop.hive.exec.dynamic.partition", true)
spark.conf.set("hive.exec.dynamic.partition.mode","nonstrict")
spark.conf.set("hive.exec.dynamic.partition", true)


*Case 1 : FAILS*

val DF = Seq(("test1", 123)).toDF("name", "num")
 val partKey = List("num").map(x => x)

DF.write.option("path",
"gs://test_dd1/").mode(SaveMode.Overwrite).partitionBy(partKey:
_*).format("orc").saveAsTable("us_wm_supply_chain_otif_stg.test_tb1")

val DF1 = Seq(("test2", 125)).toDF("name", "num")
DF.write.mode(SaveMode.Overwrite).format("orc").insertInto("us_wm_supply_chain_otif_stg.test_tb1")





*java.lang.NullPointerException  at
org.apache.hadoop.fs.Path.<init>(Path.java:141)  at
org.apache.hadoop.fs.Path.<init>(Path.java:120)  at
org.apache.hadoop.fs.Path.suffix(Path.java:441)  at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.$anonfun$getCustomPartitionLocations$1(InsertIntoHadoopFsRelationCommand.scala:254)*


*Case 2: Succeeds  *

val DF = Seq(("test1", 123)).toDF("name", "num")
 val partKey = List("num").map(x => x)

DF.write.option("path",
"gs://test_dd1/abc/").mode(SaveMode.Overwrite).partitionBy(partKey:
_*).format("orc").saveAsTable("us_wm_supply_chain_otif_stg.test_tb2")

val DF1 = Seq(("test2", 125)).toDF("name", "num")

DF1.write.mode(SaveMode.Overwrite).format("orc").insertInto("us_wm_supply_chain_otif_stg.test_tb2")


With Best Regards,

Dipayan Dev

Probable Spark Bug while inserting into flat GCS bucket?

Reply via email to