Re: context.runJob() was suspended in getPreferredLocations() function

2017-01-01 Thread Liang-Chi Hsieh

Hi,

Simply said, you submit another Job in the event thread which will be
blocked and can't receive the this job submission event. So your second job
submission is never processed, and the getPreferredLocations method is never
returned.



Fei Hu wrote
> Dear all,
> 
> I tried to customize my own RDD. In the getPreferredLocations() function,
> I
> used the following code to query anonter RDD, which was used as an input
> to
> initialize this customized RDD:
> 
>* val results: Array[Array[DataChunkPartition]] =
> context.runJob(partitionsRDD, (context: TaskContext, partIter:
> Iterator[DataChunkPartition]) => partIter.toArray, partitions, allowLocal
> =
> true)*
> 
> The problem is that when executing the above code, the task seemed to be
> suspended. I mean the job just stopped at this code, but no errors and no
> outputs.
> 
> What is the reason for it?
> 
> Thanks,
> Fei





-
Liang-Chi Hsieh | @viirya 
Spark Technology Center 
http://www.spark.tc/ 
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/context-runJob-was-suspended-in-getPreferredLocations-function-tp20412p20419.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Skip Corrupted Parquet blocks / footer.

2017-01-01 Thread Abhishek
You will have to change the metadata file under _spark_metadata folder to 
remove the listing of corrupt files.

Thanks,
Shobhit G 

Sent from my iPhone

> On Dec 31, 2016, at 8:11 PM, khyati [via Apache Spark Developers List] 
>  wrote:
> 
> Hi, 
> 
> I am trying to read the multiple parquet files in sparksql. In one dir there 
> are two files, of which one is corrupted. While trying to read these files, 
> sparksql throws Exception for the corrupted file. 
> 
> val newDataDF = 
> sqlContext.read.parquet("/data/testdir/data1.parquet","/data/testdir/corruptblock.0")
>  
> newDataDF.show 
> 
> throws Exception. 
> 
> Is there any way to just skip the file having corrupted block/footer and just 
> read the file/files which are proper? 
> 
> Thanks 
> 
> If you reply to this email, your message will be added to the discussion 
> below:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Skip-Corrupted-Parquet-blocks-footer-tp20418.html
> To start a new topic under Apache Spark Developers List, email 
> ml-node+s1001551n1...@n3.nabble.com 
> To unsubscribe from Apache Spark Developers List, click here.
> NAML




-
Regards, 
Abhi
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Skip-Corrupted-Parquet-blocks-footer-tp20418p20420.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Skip Corrupted Parquet blocks / footer.

2017-01-01 Thread Reynold Xin
In Spark 2.1, set spark.sql.files.ignoreCorruptFiles to true.

On Sun, Jan 1, 2017 at 1:11 PM, khyati  wrote:

> Hi,
>
> I am trying to read the multiple parquet files in sparksql. In one dir
> there
> are two files, of which one is corrupted. While trying to read these files,
> sparksql throws Exception for the corrupted file.
>
> val newDataDF =
> sqlContext.read.parquet("/data/testdir/data1.parquet","/
> data/testdir/corruptblock.0")
> newDataDF.show
>
> throws Exception.
>
> Is there any way to just skip the file having corrupted block/footer and
> just read the file/files which are proper?
>
> Thanks
>
>
>
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/Skip-Corrupted-
> Parquet-blocks-footer-tp20418.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>