Could you do something like this prior to calling the action.
// Create FileSystem object from Hadoop Configuration
val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
// This methods returns Boolean (true - if file exists, false - if file
doesn't exist
val fileExists = fs.exists(new Path("<parh_to_file>"))
if (fileExists) println("File exists!")
else println("File doesn't exist!")
Not sure that will help you or not, just a thought.
-Todd
On Tue, May 5, 2020 at 11:45 AM Mich Talebzadeh <[email protected]>
wrote:
> Thanks Brandon!
>
> i should have remembered that.
>
> basically the code gets out with sys.exit(1) if it cannot find the file
>
> I guess there is no easy way of validating DF except actioning it by
> show(1,0) etc and checking if it works?
>
> Regards,
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 5 May 2020 at 16:41, Brandon Geise <[email protected]> wrote:
>
>> You could use the Hadoop API and check if the file exists.
>>
>>
>>
>> *From: *Mich Talebzadeh <[email protected]>
>> *Date: *Tuesday, May 5, 2020 at 11:25 AM
>> *To: *"user @spark" <[email protected]>
>> *Subject: *Exception handling in Spark
>>
>>
>>
>> Hi,
>>
>>
>>
>> As I understand exception handling in Spark only makes sense if one
>> attempts an action as opposed to lazy transformations?
>>
>>
>>
>> Let us assume that I am reading an XML file from the HDFS directory and
>> create a dataframe DF on it
>>
>>
>>
>> val broadcastValue = "123456789" // I assume this will be sent as a
>> constant for the batch
>>
>> // Create a DF on top of XML
>> val df = spark.read.
>> format("com.databricks.spark.xml").
>> option("rootTag", "hierarchy").
>> option("rowTag", "sms_request").
>> load("/tmp/broadcast.xml")
>>
>> val newDF = df.withColumn("broadcastid", lit(broadcastValue))
>>
>> newDF.createOrReplaceTempView("tmp")
>>
>> // Put data in Hive table
>> //
>> sqltext = """
>> INSERT INTO TABLE michtest.BroadcastStaging PARTITION
>> (broadcastid="123456", brand)
>> SELECT
>> ocis_party_id AS partyId
>> , target_mobile_no AS phoneNumber
>> , brand
>> , broadcastid
>> FROM tmp
>> """
>> //
>>
>> // Here I am performing a collection
>>
>> try {
>>
>> spark.sql(sqltext)
>>
>> } catch {
>>
>> case e: SQLException => e.printStackTrace
>>
>> sys.exit()
>>
>> }
>>
>>
>>
>> Now the issue I have is that what if the xml file /tmp/broadcast.xml
>> does not exist or deleted? I won't be able to catch the error until the
>> hive table is populated. Of course I can write a shell script to check if
>> the file exist before running the job or put small collection like
>> df.show(1,0). Are there more general alternatives?
>>
>>
>>
>> Thanks
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn
>> *https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>