Read is an action, so you could wrap it in a Try (or whatever you want)
scala> val df = Try(spark.read.csv("test"))
df: scala.util.Try[org.apache.spark.sql.DataFrame] =
Failure(org.apache.spark.sql.AnalysisException: Path does not exist:
file:/test;)
From: Mich Talebzadeh <[email protected]>
Date: Tuesday, May 5, 2020 at 12:45 PM
To: Brandon Geise <[email protected]>
Cc: "user @spark" <[email protected]>
Subject: Re: Exception handling in Spark
Thanks Brandon!
i should have remembered that.
basically the code gets out with sys.exit(1) if it cannot find the file
I guess there is no easy way of validating DF except actioning it by show(1,0)
etc and checking if it works?
Regards,
Dr Mich Talebzadeh
LinkedIn
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
http://talebzadehmich.wordpress.com
Disclaimer: Use it at your own risk. Any and all responsibility for any loss,
damage or destruction of data or any other property which may arise from
relying on this email's technical content is explicitly disclaimed. The author
will in no case be liable for any monetary damages arising from such loss,
damage or destruction.
On Tue, 5 May 2020 at 16:41, Brandon Geise <[email protected]> wrote:
You could use the Hadoop API and check if the file exists.
From: Mich Talebzadeh <[email protected]>
Date: Tuesday, May 5, 2020 at 11:25 AM
To: "user @spark" <[email protected]>
Subject: Exception handling in Spark
Hi,
As I understand exception handling in Spark only makes sense if one attempts an
action as opposed to lazy transformations?
Let us assume that I am reading an XML file from the HDFS directory and create
a dataframe DF on it
val broadcastValue = "123456789" // I assume this will be sent as a constant
for the batch
// Create a DF on top of XML
val df = spark.read.
format("com.databricks.spark.xml").
option("rootTag", "hierarchy").
option("rowTag", "sms_request").
load("/tmp/broadcast.xml")
val newDF = df.withColumn("broadcastid", lit(broadcastValue))
newDF.createOrReplaceTempView("tmp")
// Put data in Hive table
//
sqltext = """
INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastid="123456",
brand)
SELECT
ocis_party_id AS partyId
, target_mobile_no AS phoneNumber
, brand
, broadcastid
FROM tmp
"""
//
// Here I am performing a collection
try {
spark.sql(sqltext)
} catch {
case e: SQLException => e.printStackTrace
sys.exit()
}
Now the issue I have is that what if the xml file /tmp/broadcast.xml does not
exist or deleted? I won't be able to catch the error until the hive table is
populated. Of course I can write a shell script to check if the file exist
before running the job or put small collection like df.show(1,0). Are there
more general alternatives?
Thanks
Dr Mich Talebzadeh
LinkedIn
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
http://talebzadehmich.wordpress.com
Disclaimer: Use it at your own risk. Any and all responsibility for any loss,
damage or destruction of data or any other property which may arise from
relying on this email's technical content is explicitly disclaimed. The author
will in no case be liable for any monetary damages arising from such loss,
damage or destruction.