Re: context.runJob() was suspended in getPreferredLocations() function
Hi, Simply said, you submit another Job in the event thread which will be blocked and can't receive the this job submission event. So your second job submission is never processed, and the getPreferredLocations method is never returned. Fei Hu wrote > Dear all, > > I tried to customize my own RDD. In the getPreferredLocations() function, > I > used the following code to query anonter RDD, which was used as an input > to > initialize this customized RDD: > >* val results: Array[Array[DataChunkPartition]] = > context.runJob(partitionsRDD, (context: TaskContext, partIter: > Iterator[DataChunkPartition]) => partIter.toArray, partitions, allowLocal > = > true)* > > The problem is that when executing the above code, the task seemed to be > suspended. I mean the job just stopped at this code, but no errors and no > outputs. > > What is the reason for it? > > Thanks, > Fei - Liang-Chi Hsieh | @viirya Spark Technology Center http://www.spark.tc/ -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/context-runJob-was-suspended-in-getPreferredLocations-function-tp20412p20419.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Skip Corrupted Parquet blocks / footer.
You will have to change the metadata file under _spark_metadata folder to remove the listing of corrupt files. Thanks, Shobhit G Sent from my iPhone > On Dec 31, 2016, at 8:11 PM, khyati [via Apache Spark Developers List] > wrote: > > Hi, > > I am trying to read the multiple parquet files in sparksql. In one dir there > are two files, of which one is corrupted. While trying to read these files, > sparksql throws Exception for the corrupted file. > > val newDataDF = > sqlContext.read.parquet("/data/testdir/data1.parquet","/data/testdir/corruptblock.0") > > newDataDF.show > > throws Exception. > > Is there any way to just skip the file having corrupted block/footer and just > read the file/files which are proper? > > Thanks > > If you reply to this email, your message will be added to the discussion > below: > http://apache-spark-developers-list.1001551.n3.nabble.com/Skip-Corrupted-Parquet-blocks-footer-tp20418.html > To start a new topic under Apache Spark Developers List, email > ml-node+s1001551n1...@n3.nabble.com > To unsubscribe from Apache Spark Developers List, click here. > NAML - Regards, Abhi -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Skip-Corrupted-Parquet-blocks-footer-tp20418p20420.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: Skip Corrupted Parquet blocks / footer.
In Spark 2.1, set spark.sql.files.ignoreCorruptFiles to true. On Sun, Jan 1, 2017 at 1:11 PM, khyati wrote: > Hi, > > I am trying to read the multiple parquet files in sparksql. In one dir > there > are two files, of which one is corrupted. While trying to read these files, > sparksql throws Exception for the corrupted file. > > val newDataDF = > sqlContext.read.parquet("/data/testdir/data1.parquet","/ > data/testdir/corruptblock.0") > newDataDF.show > > throws Exception. > > Is there any way to just skip the file having corrupted block/footer and > just read the file/files which are proper? > > Thanks > > > > -- > View this message in context: http://apache-spark- > developers-list.1001551.n3.nabble.com/Skip-Corrupted- > Parquet-blocks-footer-tp20418.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >