I will increase memory for the job...that will also fix it right ? On Apr 10, 2015 12:43 PM, "Reza Zadeh" <r...@databricks.com> wrote:
> You should pull in this PR: https://github.com/apache/spark/pull/5364 > It should resolve that. It is in master. > Best, > Reza > > On Fri, Apr 10, 2015 at 8:32 AM, Debasish Das <debasish.da...@gmail.com> > wrote: > >> Hi, >> >> I am benchmarking row vs col similarity flow on 60M x 10M matrices... >> >> Details are in this JIRA: >> >> https://issues.apache.org/jira/browse/SPARK-4823 >> >> For testing I am using Netflix data since the structure is very similar: >> 50k x 17K near dense similarities.. >> >> Items are 17K and so I did not activate threshold in colSimilarities yet >> (it's at 1e-4) >> >> Running Spark on YARN with 20 nodes, 4 cores, 16 gb, shuffle threshold 0.6 >> >> I keep getting these from col similarity code from 1.2 branch. Should I >> use Master ? >> >> 15/04/10 11:08:36 WARN BlockManagerMasterActor: Removing BlockManager >> BlockManagerId(5, tblpmidn36adv-hdp.tdc.vzwcorp.com, 44410) with no >> recent heart beats: 50315ms exceeds 45000ms >> >> 15/04/10 11:09:12 ERROR ContextCleaner: Error cleaning broadcast 1012 >> >> java.util.concurrent.TimeoutException: Futures timed out after [30 >> seconds] >> >> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) >> >> at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) >> >> at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) >> >> at >> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) >> >> at scala.concurrent.Await$.result(package.scala:107) >> >> at >> org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:137) >> >> at >> org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:227) >> >> at >> org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45) >> >> at >> org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:66) >> >> at >> org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:185) >> >> at >> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:147) >> >> at >> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:138) >> >> at scala.Option.foreach(Option.scala:236) >> >> at >> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:138) >> >> at >> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134) >> >> at >> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134) >> >> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1468) >> >> at org.apache.spark.ContextCleaner.org >> $apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:133) >> >> at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65) >> >> I knew how to increase the 45 ms to something higher as it is compute >> heavy job but in YARN, I am not sure how to set that config.. >> >> But in any-case that's a warning and should not affect the job... >> >> Any idea how to improve the runtime other than increasing threshold to >> 1e-2 ? I will do that next >> >> Was netflix dataset benchmarked for col based similarity flow before ? >> similarity output from this dataset becomes near dense and so it is >> interesting for stress testing... >> >> Thanks. >> >> Deb >> > >