Hi,

As I understand exception handling in Spark only makes sense if one
attempts an action as opposed to lazy transformations?

Let us assume that I am reading an XML file from the HDFS directory  and
create a dataframe DF on it

val broadcastValue = "123456789"  // I assume this will be sent as a
constant for the batch
// Create a DF on top of XML
val df = spark.read.
                format("com.databricks.spark.xml").
                option("rootTag", "hierarchy").
                option("rowTag", "sms_request").
                load("/tmp/broadcast.xml")

val newDF = df.withColumn("broadcastid", lit(broadcastValue))

newDF.createOrReplaceTempView("tmp")

  // Put data in Hive table
  //
  sqltext = """
  INSERT INTO TABLE michtest.BroadcastStaging PARTITION
(broadcastid="123456", brand)
  SELECT
          ocis_party_id AS partyId
        , target_mobile_no AS phoneNumber
        , brand
        , broadcastid
  FROM tmp
  """
//
// Here I am performing a collection
try  {
         spark.sql(sqltext)
} catch {
    case e: SQLException => e.printStackTrace
    sys.exit()
}

Now the issue I have is that what if the xml file  /tmp/broadcast.xml does
not exist or deleted? I won't be able to catch the error until the hive
table is populated. Of course I can write a shell script to check if the
file exist before running the job or put small collection like
df.show(1,0). Are there more general alternatives?

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Reply via email to