Re: Handle BlockMissingException in pyspark

2018-08-07 Thread Divay Jindal
Hey John, Spark version : 2.3 Hadoop version : Hadoop 2.6.0-cdh5.14.2 Is there anyway I can handle such an exception in spark code itself ( as for a matter any other kind of exception) ? On Aug 7, 2018 1:19 AM, "John Zhuge" wrote: BlockMissingException typically indicates the HDFS file is corr

Re: Handle BlockMissingException in pyspark

2018-08-06 Thread John Zhuge
BlockMissingException typically indicates the HDFS file is corrupted. Might be an HDFS issue, Hadoop mailing list is a better bet: u...@hadoop.apache.org. Capture at the full stack trace in executor log. If the file still exists, run `hdfs fsck -blockId blk_1233169822_159765693` to determine wheth

Handle BlockMissingException in pyspark

2018-08-06 Thread Divay Jindal
Hi , I am running pyspark in dockerized jupyter environment , I am constantly getting this error : ``` Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 33 in stage 25.0 failed