Hi Burak, thanks for your answer.

I have a "new MyResultFunction()(sparkContext, inputPath).collect" in the
unit test (so to evaluate the actual result), and there I can observe and
catch the exception. Even considering Spark's laziness, shouldn't I catch
the exception while occurring in the try..catch statement that encloses the
textFile invocation?

Best,
Roberto


On Mon, Aug 24, 2015 at 7:38 PM, Burak Yavuz <brk...@gmail.com> wrote:

> textFile is a lazy operation. It doesn't evaluate until you call an action
> on it, such as .count(). Therefore, you won't catch the exception there.
>
> Best,
> Burak
>
> On Mon, Aug 24, 2015 at 9:09 AM, Roberto Coluccio <
> roberto.coluc...@gmail.com> wrote:
>
>> Hello folks,
>>
>> I'm experiencing an unexpected behaviour, that suggests me thinking about
>> my missing notions on how Spark works. Let's say I have a Spark driver that
>> invokes a function like:
>>
>> ----- in myDriver -----
>>
>> val sparkContext = new SparkContext(mySparkConf)
>> val inputPath = "file://home/myUser/project/resources/date=*/*"
>>
>> val myResult = new MyResultFunction()(sparkContext, inputPath)
>>
>> ----- in MyResultFunctionOverRDD ------
>>
>> class MyResultFunction extends Function2[SparkContext, String,
>> RDD[String]] with Serializable {
>>   override def apply(sparkContext: SparkContext, inputPath: String):
>> RDD[String] = {
>>     try {
>>       sparkContext.textFile(inputPath, 1)
>>     } catch {
>>       case t: Throwable => {
>>         myLogger.error(s"error: ${t.getStackTraceString}\n")
>>         sc.makeRDD(Seq[String]())
>>       }
>>     }
>>   }
>> }
>>
>> What happens is that I'm *unable to catch exceptions* thrown by the
>> "textFile" method within the try..catch clause in MyResultFunction. In
>> fact, in a unit test for that function where I call it passing an invalid
>> "inputPath", I don't get an empty RDD as result, but the unit test exits
>> (and fails) due to exception not handled.
>>
>> What am I missing here?
>>
>> Thank you.
>>
>> Best regards,
>> Roberto
>>
>
>

Reply via email to