Hey John, Spark version : 2.3 Hadoop version : Hadoop 2.6.0-cdh5.14.2
Is there anyway I can handle such an exception in spark code itself ( as for a matter any other kind of exception) ? On Aug 7, 2018 1:19 AM, "John Zhuge" <john.zh...@gmail.com> wrote: BlockMissingException typically indicates the HDFS file is corrupted. Might be an HDFS issue, Hadoop mailing list is a better bet: u...@hadoop.apache.org. Capture at the full stack trace in executor log. If the file still exists, run `hdfs fsck -blockId blk_1233169822_159765693` to determine whether it is corrupted. If not corrupted, could there be excessive (thousands) current reads on the block? Hadoop version? Spark version? On Mon, Aug 6, 2018 at 2:21 AM Divay Jindal <divay.jindal.n...@gmail.com> wrote: > Hi , > > I am running pyspark in dockerized jupyter environment , I am constantly > getting this error : > > ``` > > Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.runJob. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 33 > in stage 25.0 failed 1 times, most recent failure: Lost task 33.0 in stage > 25.0 (TID 35067, localhost, executor driver) > : org.apache.hadoop.hdfs.BlockMissingException > : Could not obtain block: > BP-1742911633-10.225.201.50-1479296658503:blk_1233169822_159765693 > > ``` > > Please can anyone help me with how to handle such exception in pyspark. > > -- > Best Regards > *Divay Jindal* > > > -- John