Depends... The heartbeat you received happens due to GC pressure (probably due to Full GC). If you increase the memory too much, the GC's may be less frequent, but the Full GC's may take longer. Try increasing the following confs:
spark.executor.heartbeatInterval spark.core.connection.ack.wait.timeout Best, Burak On Fri, Apr 10, 2015 at 8:52 PM, Debasish Das <debasish.da...@gmail.com> wrote: > I will increase memory for the job...that will also fix it right ? > On Apr 10, 2015 12:43 PM, "Reza Zadeh" <r...@databricks.com> wrote: > >> You should pull in this PR: https://github.com/apache/spark/pull/5364 >> It should resolve that. It is in master. >> Best, >> Reza >> >> On Fri, Apr 10, 2015 at 8:32 AM, Debasish Das <debasish.da...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I am benchmarking row vs col similarity flow on 60M x 10M matrices... >>> >>> Details are in this JIRA: >>> >>> https://issues.apache.org/jira/browse/SPARK-4823 >>> >>> For testing I am using Netflix data since the structure is very similar: >>> 50k x 17K near dense similarities.. >>> >>> Items are 17K and so I did not activate threshold in colSimilarities yet >>> (it's at 1e-4) >>> >>> Running Spark on YARN with 20 nodes, 4 cores, 16 gb, shuffle threshold >>> 0.6 >>> >>> I keep getting these from col similarity code from 1.2 branch. Should I >>> use Master ? >>> >>> 15/04/10 11:08:36 WARN BlockManagerMasterActor: Removing BlockManager >>> BlockManagerId(5, tblpmidn36adv-hdp.tdc.vzwcorp.com, 44410) with no >>> recent heart beats: 50315ms exceeds 45000ms >>> >>> 15/04/10 11:09:12 ERROR ContextCleaner: Error cleaning broadcast 1012 >>> >>> java.util.concurrent.TimeoutException: Futures timed out after [30 >>> seconds] >>> >>> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) >>> >>> at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) >>> >>> at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) >>> >>> at >>> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) >>> >>> at scala.concurrent.Await$.result(package.scala:107) >>> >>> at >>> org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:137) >>> >>> at >>> org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:227) >>> >>> at >>> org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45) >>> >>> at >>> org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:66) >>> >>> at >>> org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:185) >>> >>> at >>> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:147) >>> >>> at >>> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:138) >>> >>> at scala.Option.foreach(Option.scala:236) >>> >>> at >>> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:138) >>> >>> at >>> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134) >>> >>> at >>> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134) >>> >>> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1468) >>> >>> at org.apache.spark.ContextCleaner.org >>> $apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:133) >>> >>> at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65) >>> >>> I knew how to increase the 45 ms to something higher as it is compute >>> heavy job but in YARN, I am not sure how to set that config.. >>> >>> But in any-case that's a warning and should not affect the job... >>> >>> Any idea how to improve the runtime other than increasing threshold to >>> 1e-2 ? I will do that next >>> >>> Was netflix dataset benchmarked for col based similarity flow before ? >>> similarity output from this dataset becomes near dense and so it is >>> interesting for stress testing... >>> >>> Thanks. >>> >>> Deb >>> >> >>