Re: What are the likely causes of org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle?

XianXing Zhang Fri, 26 Jun 2015 11:43:34 -0700

Do we have any update on this thread? Has anyone met and solved similar
problems before?


Any pointers will be greatly appreciated!

Best,
XianXing

On Mon, Jun 15, 2015 at 11:48 PM, Jia Yu <[email protected]> wrote:

> Hi Peng,
>
> I got exactly same error! My shuffle data is also very large. Have you
> figured out a method to solve that?
>
> Thanks,
> Jia
>
> On Fri, Apr 24, 2015 at 7:59 AM, Peng Cheng <[email protected]> wrote:
>
>> I'm deploying a Spark data processing job on an EC2 cluster, the job is
>> small
>> for the cluster (16 cores with 120G RAM in total), the largest RDD has
>> only
>> 76k+ rows. But heavily skewed in the middle (thus requires repartitioning)
>> and each row has around 100k of data after serialization. The job always
>> got
>> stuck in repartitioning. Namely, the job will constantly get following
>> errors and retries:
>>
>> org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output
>> location for shuffle
>>
>> org.apache.spark.shuffle.FetchFailedException: Error in opening
>> FileSegmentManagedBuffer
>>
>> org.apache.spark.shuffle.FetchFailedException:
>> java.io.FileNotFoundException: /tmp/spark-...
>> I've tried to identify the problem but it seems like both memory and disk
>> consumption of the machine throwing these errors are below 50%. I've also
>> tried different configurations, including:
>>
>> let driver/executor memory use 60% of total memory.
>> let netty to priortize JVM shuffling buffer.
>> increase shuffling streaming buffer to 128m.
>> use KryoSerializer and max out all buffers
>> increase shuffling memoryFraction to 0.4
>> But none of them works. The small job always trigger the same series of
>> errors and max out retries (upt to 1000 times). How to troubleshoot this
>> thing in such situation?
>>
>> Thanks a lot if you have any clue.
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/What-are-the-likely-causes-of-org-apache-spark-shuffle-MetadataFetchFailedException-Missing-an-outpu-tp22646.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>

Re: What are the likely causes of org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle?

Reply via email to