Re: Out of memory on large RDDs

2014-08-27 Thread Jianshi Huang
;>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> at >>>>> java.util.concurrent.ThreadPool

Re: Out of memory on large RDDs

2014-08-26 Thread Andrew Ash
ang.Thread.run(Thread.java:724) >>>> Caused by: java.util.concurrent.TimeoutException: Futures timed out >>>> after [3] milliseconds >>>> at akka.dispatch.DefaultPromise.ready(Future.scala:870) >>>> at akka.dispatch.DefaultPromise.result(Future.scala:874) &

Re: Out of memory on large RDDs

2014-03-11 Thread Grega Kespret
00] milliseconds >>>> at akka.dispatch.DefaultPromise.ready(Future.scala:870) >>>> at akka.dispatch.DefaultPromise.result(Future.scala:874) >>>> at akka.dispatch.Await$.result(Future.scala:74) >>>> at >>>> org.a

Re: Out of memory on large RDDs

2014-03-11 Thread Mayur Rustagi
tch.Await$.result(Future.scala:74) >>> at >>> org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:81) >>> ... 25 more >>> >>> >>> Before the error I can see this kind of logs: >>> >>> 14/03/11 14:29

Re: Out of memory on large RDDs

2014-03-11 Thread sparrow
on't have map outputs for >> shuffle 0, fetching them 14/03/11 14:29:40 INFO MapOutputTracker: Don't >> have map outputs for shuffle 0, fetching them 14/03/11 14:29:40 INFO >> MapOutputTracker: Don't have map outputs for shuffle 0, fetching them >> >> Can y

Re: Out of memory on large RDDs

2014-03-11 Thread Mayur Rustagi
Shuffle data is not kept in memory. Did you try additional memory configurations( https://spark.incubator.apache.org/docs/latest/scala-programming-guide.html#rdd-persistence ) Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi

Re: Out of memory on large RDDs

2014-03-11 Thread Domen Grabec
Hi I have a spark cluster with 4 workers each with 13GB ram. I would like to process a large data set (does not fit in memory) that consists of JSON entries. These are the transformations applied: SparkContext.textFile(s3url). // read files from s3 keyBy(_.parseJson.id) // key by id that is locat