Hi, As I understand exception handling in Spark only makes sense if one attempts an action as opposed to lazy transformations?
Let us assume that I am reading an XML file from the HDFS directory and create a dataframe DF on it val broadcastValue = "123456789" // I assume this will be sent as a constant for the batch // Create a DF on top of XML val df = spark.read. format("com.databricks.spark.xml"). option("rootTag", "hierarchy"). option("rowTag", "sms_request"). load("/tmp/broadcast.xml") val newDF = df.withColumn("broadcastid", lit(broadcastValue)) newDF.createOrReplaceTempView("tmp") // Put data in Hive table // sqltext = """ INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastid="123456", brand) SELECT ocis_party_id AS partyId , target_mobile_no AS phoneNumber , brand , broadcastid FROM tmp """ // // Here I am performing a collection try { spark.sql(sqltext) } catch { case e: SQLException => e.printStackTrace sys.exit() } Now the issue I have is that what if the xml file /tmp/broadcast.xml does not exist or deleted? I won't be able to catch the error until the hive table is populated. Of course I can write a shell script to check if the file exist before running the job or put small collection like df.show(1,0). Are there more general alternatives? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.