I think I emailed about a similar issue, but in standalone mode. I haven't
investigated much so I don't know what's a good fix.


On Fri, Aug 22, 2014 at 12:00 PM, Jiayu Zhou <dearji...@gmail.com> wrote:

> Hi,
>
> I am having this FetchFailed issue when the driver is about to collect
> about
> 2.5M lines of short strings (about 10 characters each line) from a YARN
> cluster with 400 nodes:
>
> *14/08/22 11:43:27 WARN scheduler.TaskSetManager: Lost task 205.0 in stage
> 0.0 (TID 1228, aaa.xxx.com): FetchFailed(BlockManagerId(220, aaa.xxx.com,
> 37899, 0), shuffleId=0, mapId=420, reduceId=205)
> 14/08/22 11:43:27 WARN scheduler.TaskSetManager: Lost task 603.0 in stage
> 0.0 (TID 1626, aaa.xxx.com): FetchFailed(BlockManagerId(220, aaa.xxx.com,
> 37899, 0), shuffleId=0, mapId=420, reduceId=603)*
>
> And other than this FetchFailed, I am not able to see anything else from
> the
> log file (no OOM errors shown).
>
> This does not happen when there is only 2M lines. I guess it might because
> of the akka message size, and then I used the following
>
> spark.akka.frameSize  100
> spark.akka.timeout      200
>
> And that does not help as well. Has anyone experienced similar problems?
>
> Thanks,
> Jiayu
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/FetchFailed-when-collect-at-YARN-cluster-tp12670.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to